Vulnerability History

Date	High Risk	Low Risk
2024-12-07	1	1
Audit Report Details

13032
Lines of Code
Open
Resolved
🚨 High Risk Vulnerabilities

⚠️ Low Risk Vulnerabilities

Vulnerable Code:

1---
2File: /.github/pull_request_template.md
3---
4
5## Describe your changes
6
7## Issue ticket number and link
8
9[Task Title](https://www.notion.so/Compute-SN-c27d35dd084e4c4d92374f55cdd293f2?p=f9b26856f1a6406892b5db46446260da&pm=s)
10
11## Checklist before requesting a review
12- [ ] I have performed a self-review of my code
13- [ ] I wrote tests.
14- [ ] Need to take care of performance?
15
16
17
18---
19File: /contrib/CODE_REVIEW_DOCS.md
20---
21
22# Code Review
23### Conceptual Review
24
25A review can be a conceptual review, where the reviewer leaves a comment
26 * `Concept (N)ACK`, meaning "I do (not) agree with the general goal of this pull
27   request",
28 * `Approach (N)ACK`, meaning `Concept ACK`, but "I do (not) agree with the
29   approach of this change".
30
31A `NACK` needs to include a rationale why the change is not worthwhile.
32NACKs without accompanying reasoning may be disregarded.
33After conceptual agreement on the change, code review can be provided. A review
34begins with `ACK BRANCH_COMMIT`, where `BRANCH_COMMIT` is the top of the PR
35branch, followed by a description of how the reviewer did the review. The
36following language is used within pull request comments:
37
38  - "I have tested the code", involving change-specific manual testing in
39    addition to running the unit, functional, or fuzz tests, and in case it is
40    not obvious how the manual testing was done, it should be described;
41  - "I have not tested the code, but I have reviewed it and it looks
42    OK, I agree it can be merged";
43  - A "nit" refers to a trivial, often non-blocking issue.
44
45### Code Review
46Project maintainers reserve the right to weigh the opinions of peer reviewers
47using common sense judgement and may also weigh based on merit. Reviewers that
48have demonstrated a deeper commitment and understanding of the project over time
49or who have clear domain expertise may naturally have more weight, as one would
50expect in all walks of life.
51
52Where a patch set affects consensus-critical code, the bar will be much
53higher in terms of discussion and peer review requirements, keeping in mind that
54mistakes could be very costly to the wider community. This includes refactoring
55of consensus-critical code.
56
57Where a patch set proposes to change the Bittensor consensus, it must have been
58discussed extensively on the discord server and other channels, be accompanied by a widely
59discussed BIP and have a generally widely perceived technical consensus of being
60a worthwhile change based on the judgement of the maintainers.
61
62### Finding Reviewers
63
64As most reviewers are themselves developers with their own projects, the review
65process can be quite lengthy, and some amount of patience is required. If you find
66that you've been waiting for a pull request to be given attention for several
67months, there may be a number of reasons for this, some of which you can do something
68about:
69
70  - It may be because of a feature freeze due to an upcoming release. During this time,
71    only bug fixes are taken into consideration. If your pull request is a new feature,
72    it will not be prioritized until after the release. Wait for the release.
73  - It may be because the changes you are suggesting do not appeal to people. Rather than
74    nits and critique, which require effort and means they care enough to spend time on your
75    contribution, thundering silence is a good sign of widespread (mild) dislike of a given change
76    (because people don't assume *others* won't actually like the proposal). Don't take
77    that personally, though! Instead, take another critical look at what you are suggesting
78    and see if it: changes too much, is too broad, doesn't adhere to the
79    [developer notes](DEVELOPMENT_WORKFLOW.md), is dangerous or insecure, is messily written, etc.
80    Identify and address any of the issues you find. Then ask e.g. on IRC if someone could give
81    their opinion on the concept itself.
82  - It may be because your code is too complex for all but a few people, and those people
83    may not have realized your pull request even exists. A great way to find people who
84    are qualified and care about the code you are touching is the
85    [Git Blame feature](https://docs.github.com/en/github/managing-files-in-a-repository/managing-files-on-github/tracking-changes-in-a-file). Simply
86    look up who last modified the code you are changing and see if you can find
87    them and give them a nudge. Don't be incessant about the nudging, though.
88  - Finally, if all else fails, ask on IRC or elsewhere for someone to give your pull request
89    a look. If you think you've been waiting for an unreasonably long time (say,
90    more than a month) for no particular reason (a few lines changed, etc.),
91    this is totally fine. Try to return the favor when someone else is asking
92    for feedback on their code, and the universe balances out.
93  - Remember that the best thing you can do while waiting is give review to others!
94
95
96---
97File: /contrib/CONTRIBUTING.md
98---
99
100# Contributing to Bittensor Subnet Development
101
102The following is a set of guidelines for contributing to the Bittensor ecosystem. These are **HIGHLY RECOMMENDED** guidelines, but not hard-and-fast rules. Use your best judgment, and feel free to propose changes to this document in a pull request.
103
104## Table Of Contents
1051. [How Can I Contribute?](#how-can-i-contribute)
106   1. [Communication Channels](#communication-channels)
107   1. [Code Contribution General Guidelines](#code-contribution-general-guidelines)
108   1. [Pull Request Philosophy](#pull-request-philosophy)
109   1. [Pull Request Process](#pull-request-process)
110   1. [Addressing Feedback](#addressing-feedback)
111   1. [Squashing Commits](#squashing-commits)
112   1. [Refactoring](#refactoring)
113   1. [Peer Review](#peer-review)
114 1. [Suggesting Enhancements and Features](#suggesting-enhancements-and-features)
115
116
117## How Can I Contribute?
118TODO(developer): Define your desired contribution procedure.
119
120## Communication Channels
121TODO(developer): Place your communication channels here
122
123> Please follow the Bittensor Subnet [style guide](./STYLE.md) regardless of your contribution type. 
124
125Here is a high-level summary:
126- Code consistency is crucial; adhere to established programming language conventions.
127- Use `black` to format your Python code; it ensures readability and consistency.
128- Write concise Git commit messages; summarize changes in ~50 characters.
129- Follow these six commit rules:
130  - Atomic Commits: Focus on one task or fix per commit.
131  - Subject and Body Separation: Use a blank line to separate the subject from the body.
132  - Subject Line Length: Keep it under 50 characters for readability.
133  - Imperative Mood: Write subject line as if giving a command or instruction.
134  - Body Text Width: Wrap text manually at 72 characters.
135  - Body Content: Explain what changed and why, not how.
136- Make use of your commit messages to simplify project understanding and maintenance.
137
138> For clear examples of each of the commit rules, see the style guide's [rules](./STYLE.md#the-six-rules-of-a-great-commit) section.
139
140### Code Contribution General Guidelines
141
142> Review the Bittensor Subnet [style guide](./STYLE.md) and [development workflow](./DEVELOPMENT_WORKFLOW.md) before contributing. 
143
144
145#### Pull Request Philosophy
146
147Patchsets and enhancements should always be focused. A pull request could add a feature, fix a bug, or refactor code, but it should not contain a mixture of these. Please also avoid 'super' pull requests which attempt to do too much, are overly large, or overly complex as this makes review difficult. 
148
149Specifically, pull requests must adhere to the following criteria:
150- Contain fewer than 50 files. PRs with more than 50 files will be closed.
151- If a PR introduces a new feature, it *must* include corresponding tests.
152- Other PRs (bug fixes, refactoring, etc.) should ideally also have tests, as they provide proof of concept and prevent regression.
153- Categorize your PR properly by using GitHub labels. This aids in the review process by informing reviewers about the type of change at a glance.
154- Make sure your code includes adequate comments. These should explain why certain decisions were made and how your changes work.
155- If your changes are extensive, consider breaking your PR into smaller, related PRs. This makes your contributions easier to understand and review.
156- Be active in the discussion about your PR. Respond promptly to comments and questions to help reviewers understand your changes and speed up the acceptance process.
157
158Generally, all pull requests must:
159
160  - Have a clear use case, fix a demonstrable bug or serve the greater good of the project (e.g. refactoring for modularisation).
161  - Be well peer-reviewed.
162  - Follow code style guidelines.
163  - Not break the existing test suite.
164  - Where bugs are fixed, where possible, there should be unit tests demonstrating the bug and also proving the fix.
165  - Change relevant comments and documentation when behaviour of code changes.
166
167#### Pull Request Process
168
169Please follow these steps to have your contribution considered by the maintainers:
170
171*Before* creating the PR:
1721. Read the [development workflow](./DEVELOPMENT_WORKFLOW.md) defined for this repository to understand our workflow.
1732. Ensure your PR meets the criteria stated in the 'Pull Request Philosophy' section.
1743. Include relevant tests for any fixed bugs or new features as stated in the [testing guide](./TESTING.md).
1754. Ensure your commit messages are clear and concise. Include the issue number if applicable.
1765. If you have multiple commits, rebase them into a single commit using `git rebase -i`.
1776. Explain what your changes do and why you think they should be merged in the PR description consistent with the [style guide](./STYLE.md).
178
179*After* creating the PR:
1801. Verify that all [status checks](https://help.github.com/articles/about-status-checks/) are passing after you submit your pull request. 
1812. Label your PR using GitHub's labeling feature. The labels help categorize the PR and streamline the review process.
1823. Document your code with comments that provide a clear understanding of your changes. Explain any non-obvious parts of your code or design decisions you've made.
1834. If your PR has extensive changes, consider splitting it into smaller, related PRs. This reduces the cognitive load on the reviewers and speeds up the review process.
184
185Please be responsive and participate in the discussion on your PR! This aids in clarifying any confusion or concerns and leads to quicker resolution and merging of your PR.
186
187> Note: If your changes are not ready for merge but you want feedback, create a draft pull request.
188
189Following these criteria will aid in quicker review and potential merging of your PR.
190While the prerequisites above must be satisfied prior to having your pull request reviewed, the reviewer(s) may ask you to complete additional design work, tests, or other changes before your pull request can be ultimately accepted.
191
192When you are ready to submit your changes, create a pull request:
193
194> **Always** follow the [style guide](./STYLE.md) and [development workflow](./DEVELOPMENT_WORKFLOW.md) before submitting pull requests.
195
196After you submit a pull request, it will be reviewed by the maintainers. They may ask you to make changes. Please respond to any comments and push your changes as a new commit.
197
198> Note: Be sure to merge the latest from "upstream" before making a pull request:
199
200```bash
201git remote add upstream https://github.com/opentensor/bittensor.git # TODO(developer): replace with your repo URL
202git fetch upstream
203git merge upstream/<your-branch-name>
204git push origin <your-branch-name>
205```
206
207#### Addressing Feedback
208
209After submitting your pull request, expect comments and reviews from other contributors. You can add more commits to your pull request by committing them locally and pushing to your fork.
210
211You are expected to reply to any review comments before your pull request is merged. You may update the code or reject the feedback if you do not agree with it, but you should express so in a reply. If there is outstanding feedback and you are not actively working on it, your pull request may be closed.
212
213#### Squashing Commits
214
215If your pull request contains fixup commits (commits that change the same line of code repeatedly) or too fine-grained commits, you may be asked to [squash](https://git-scm.com/docs/git-rebase#_interactive_mode) your commits before it will be reviewed. The basic squashing workflow is shown below.
216
217    git checkout your_branch_name
218    git rebase -i HEAD~n
219    # n is normally the number of commits in the pull request.
220    # Set commits (except the one in the first line) from 'pick' to 'squash', save and quit.
221    # On the next screen, edit/refine commit messages.
222    # Save and quit.
223    git push -f # (force push to GitHub)
224
225Please update the resulting commit message, if needed. It should read as a coherent message. In most cases, this means not just listing the interim commits.
226
227If your change contains a merge commit, the above workflow may not work and you will need to remove the merge commit first. See the next section for details on how to rebase.
228
229Please refrain from creating several pull requests for the same change. Use the pull request that is already open (or was created earlier) to amend changes. This preserves the discussion and review that happened earlier for the respective change set.
230
231The length of time required for peer review is unpredictable and will vary from pull request to pull request.
232
233#### Refactoring
234
235Refactoring is a necessary part of any software project's evolution. The following guidelines cover refactoring pull requests for the project.
236
237There are three categories of refactoring: code-only moves, code style fixes, and code refactoring. In general, refactoring pull requests should not mix these three kinds of activities in order to make refactoring pull requests easy to review and uncontroversial. In all cases, refactoring PRs must not change the behaviour of code within the pull request (bugs must be preserved as is).
238
239Project maintainers aim for a quick turnaround on refactoring pull requests, so where possible keep them short, uncomplex and easy to verify.
240
241Pull requests that refactor the code should not be made by new contributors. It requires a certain level of experience to know where the code belongs to and to understand the full ramification (including rebase effort of open pull requests). Trivial pull requests or pull requests that refactor the code with no clear benefits may be immediately closed by the maintainers to reduce unnecessary workload on reviewing.
242
243#### Peer Review
244
245Anyone may participate in peer review which is expressed by comments in the pull request. Typically reviewers will review the code for obvious errors, as well as test out the patch set and opine on the technical merits of the patch. Project maintainers take into account the peer review when determining if there is consensus to merge a pull request (remember that discussions may have taken place elsewhere, not just on GitHub). The following language is used within pull-request comments:
246
247- ACK means "I have tested the code and I agree it should be merged";
248- NACK means "I disagree this should be merged", and must be accompanied by sound technical justification. NACKs without accompanying reasoning may be disregarded;
249- utACK means "I have not tested the code, but I have reviewed it and it looks OK, I agree it can be merged";
250- Concept ACK means "I agree in the general principle of this pull request";
251- Nit refers to trivial, often non-blocking issues.
252
253Reviewers should include the commit(s) they have reviewed in their comments. This can be done by copying the commit SHA1 hash.
254
255A pull request that changes consensus-critical code is considerably more involved than a pull request that adds a feature to the wallet, for example. Such patches must be reviewed and thoroughly tested by several reviewers who are knowledgeable about the changed subsystems. Where new features are proposed, it is helpful for reviewers to try out the patch set on a test network and indicate that they have done so in their review. Project maintainers will take this into consideration when merging changes.
256
257For a more detailed description of the review process, see the [Code Review Guidelines](CODE_REVIEW_DOCS.md).
258
259> **Note:** If you find a **Closed** issue that seems like it is the same thing that you're experiencing, open a new issue and include a link to the original issue in the body of your new one.
260
261#### How Do I Submit A (Good) Bug Report?
262
263Please track bugs as GitHub issues.
264
265Explain the problem and include additional details to help maintainers reproduce the problem:
266
267* **Use a clear and descriptive title** for the issue to identify the problem.
268* **Describe the exact steps which reproduce the problem** in as many details as possible. For example, start by explaining how you started the application, e.g. which command exactly you used in the terminal, or how you started Bittensor otherwise. When listing steps, **don't just say what you did, but explain how you did it**. For example, if you ran with a set of custom configs, explain if you used a config file or command line arguments. 
269* **Provide specific examples to demonstrate the steps**. Include links to files or GitHub projects, or copy/pasteable snippets, which you use in those examples. If you're providing snippets in the issue, use [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines).
270* **Describe the behavior you observed after following the steps** and point out what exactly is the problem with that behavior.
271* **Explain which behavior you expected to see instead and why.**
272* **Include screenshots and animated GIFs** which show you following the described steps and clearly demonstrate the problem. You can use [this tool](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [this tool](https://github.com/colinkeenan/silentcast) or [this tool](https://github.com/GNOME/byzanz) on Linux.
273* **If you're reporting that Bittensor crashed**, include a crash report with a stack trace from the operating system. On macOS, the crash report will be available in `Console.app` under "Diagnostic and usage information" > "User diagnostic reports". Include the crash report in the issue in a [code block](https://help.github.com/articles/markdown-basics/#multiple-lines), a [file attachment](https://help.github.com/articles/file-attachments-on-issues-and-pull-requests/), or put it in a [gist](https://gist.github.com/) and provide link to that gist.
274* **If the problem is related to performance or memory**, include a CPU profile capture with your report, if you're using a GPU then include a GPU profile capture as well. Look into the [PyTorch Profiler](https://pytorch.org/tutorials/recipes/recipes/profiler_recipe.html) to look at memory usage of your model.
275* **If the problem wasn't triggered by a specific action**, describe what you were doing before the problem happened and share more information using the guidelines below.
276
277Provide more context by answering these questions:
278
279* **Did the problem start happening recently** (e.g. after updating to a new version) or was this always a problem?
280* If the problem started happening recently, **can you reproduce the problem in an older version of Bittensor?** 
281* **Can you reliably reproduce the issue?** If not, provide details about how often the problem happens and under which conditions it normally happens.
282
283Include details about your configuration and environment:
284
285* **Which version of Bittensor Subnet are you using?**
286* **What commit hash are you on?** You can get the exact commit hash by checking `git log` and pasting the full commit hash.
287* **What's the name and version of the OS you're using**?
288* **Are you running Bittensor Subnet in a virtual machine?** If so, which VM software are you using and which operating systems and versions are used for the host and the guest?
289* **Are you running Bittensor Subnet in a dockerized container?** If so, have you made sure that your docker container contains your latest changes and is up to date with Master branch?
290
291### Suggesting Enhancements and Features
292
293This section guides you through submitting an enhancement suggestion, including completely new features and minor improvements to existing functionality. Following these guidelines helps maintainers and the community understand your suggestion :pencil: and find related suggestions :mag_right:.
294
295When you are creating an enhancement suggestion, please [include as many details as possible](#how-do-i-submit-a-good-enhancement-suggestion). Fill in [the template](https://bit.ly/atom-behavior-pr), including the steps that you imagine you would take if the feature you're requesting existed.
296
297#### Before Submitting An Enhancement Suggestion
298
299* **Check the [debugging guide](./DEBUGGING.md).** for tips — you might discover that the enhancement is already available. Most importantly, check if you're using the latest version of the project first.
300
301#### How to Submit A (Good) Feature Suggestion
302
303* **Use a clear and descriptive title** for the issue to identify the problem.
304* **Provide a step-by-step description of the suggested enhancement** in as many details as possible.
305* **Provide specific examples to demonstrate the steps**. Include copy/pasteable snippets which you use in those examples, as [Markdown code blocks](https://help.github.com/articles/markdown-basics/#multiple-lines).
306* **Describe the current behavior** and **explain which behavior you expected to see instead** and why.
307* **Include screenshots and animated GIFs** which help you demonstrate the steps or point out the part of the project which the suggestion is related to. You can use [this tool](https://www.cockos.com/licecap/) to record GIFs on macOS and Windows, and [this tool](https://github.com/colinkeenan/silentcast) or [this tool](https://github.com/GNOME/byzanz) on Linux.
308* **Explain why this enhancement would be useful** to most users.
309* **List some other text editors or applications where this enhancement exists.**
310* **Specify the name and version of the OS you're using.**
311
312Thank you for considering contributing to Bittensor! Any help is greatly appreciated along this journey to incentivize open and permissionless intelligence.
313
314
315
316---
317File: /contrib/DEVELOPMENT_WORKFLOW.md
318---
319
320# Bittensor Subnet Development Workflow
321
322This is a highly advisable workflow to follow to keep your subtensor project organized and foster ease of contribution.
323
324## Table of contents
325
326- [Bittensor Subnet Development Workflow](#bittensor-subnet-development-workflow)
327  - [Main Branches](#main-branches)
328  - [Development Model](#development-model)
329      - [Feature Branches](#feature-branches)
330      - [Release Branches](#release-branches)
331      - [Hotfix Branches](#hotfix-branches)
332  - [Git Operations](#git-operations)
333      - [Creating a Feature Branch](#creating-a-feature-branch)
334      - [Merging Feature Branch into Staging](#merging-feature-branch-into-staging)
335      - [Creating a Release Branch](#creating-a-release-branch)
336      - [Finishing a Release Branch](#finishing-a-release-branch)
337      - [Creating a Hotfix Branch](#creating-a-hotfix-branch)
338      - [Finishing a Hotfix Branch](#finishing-a-hotfix-branch)
339  - [Continuous Integration (CI) and Continuous Deployment (CD)](#continuous-integration-ci-and-continuous-deployment-cd)
340  - [Versioning and Release Notes](#versioning-and-release-notes)
341  - [Pending Tasks](#pending-tasks)
342
343## Main Branches
344
345Bittensor's codebase consists of two main branches: **main** and **staging**.
346
347**main**
348- This is Bittensor's live production branch, which should only be updated by the core development team. This branch is protected, so refrain from pushing or merging into it unless authorized.
349
350**staging**
351- This branch is continuously updated and is where you propose and merge changes. It's essentially Bittensor's active development branch.
352
353## Development Model
354
355### Feature Branches
356
357- Branch off from: `staging`
358- Merge back into: `staging`
359- Naming convention: `feature/<ticket>/<descriptive-sentence>`
360
361Feature branches are used to develop new features for upcoming or future releases. They exist as long as the feature is in development, but will eventually be merged into `staging` or discarded. Always delete your feature branch after merging to avoid unnecessary clutter.
362
363### Release Branches
364
365- Branch off from: `staging`
366- Merge back into: `staging` and then `main`
367- Naming convention: `release/<version>/<descriptive-message>/<creator's-name>`
368
369Release branches support the preparation of a new production release, allowing for minor bug fixes and preparation of metadata (version number, configuration, etc). All new features should be merged into `staging` and wait for the next big release.
370
371### Hotfix Branches
372
373General workflow:
374
375- Branch off from: `main` or `staging`
376- Merge back into: `staging` then `main`
377- Naming convention: `hotfix/<version>/<descriptive-message>/<creator's-name>` 
378
379Hotfix branches are meant for quick fixes in the production environment. When a critical bug in a production version must be resolved immediately, a hotfix branch is created.
380
381## Git Operations
382
383#### Create a feature branch
384
3851. Branch from the **staging** branch.
386    1. Command: `git checkout -b feature/my-feature staging`
387
388> Rebase frequently with the updated staging branch so you do not face big conflicts before submitting your pull request. Remember, syncing your changes with other developers could also help you avoid big conflicts.
389
390#### Merge feature branch into staging
391
392In other words, integrate your changes into a branch that will be tested and prepared for release.
393
3941. Switch branch to staging: `git checkout staging`
3952. Merging feature branch into staging: `git merge --no-ff feature/my-feature`
3963. Pushing changes to staging: `git push origin staging`
3974. Delete feature branch: `git branch -d feature/my-feature` (alternatively, this can be navigated on the GitHub web UI)
398
399This operation is done by Github when merging a PR.
400
401So, what you have to keep in mind is:
402- Open the PR against the `staging` branch.
403- After merging a PR you should delete your feature branch. This will be strictly enforced.
404
405#### Creating a release branch
406
4071. Create branch from staging: `git checkout -b release/3.4.0/descriptive-message/creator's_name staging`
4082. Updating version with major or minor: `./scripts/update_version.sh major|minor`
4093. Commit file changes with new version: `git commit -a -m "Updated version to 3.4.0"`
410
411
412#### Finishing a Release Branch
413
414This involves releasing stable code and generating a new version for bittensor.
415
4161. Switch branch to main: `git checkout main`
4172. Merge release branch into main: `git merge --no-ff release/3.4.0/optional-descriptive-message`
4183. Tag changeset: `git tag -a v3.4.0 -m "Releasing v3.4.0: some comment about it"`
4194. Push changes to main: `git push origin main`
4205. Push tags to origin: `git push origin --tags`
421
422To keep the changes made in the __release__ branch, we need to merge those back into `staging`:
423
424- Switch branch to staging: `git checkout staging`.
425- Merging release branch into staging: `git merge --no-ff release/3.4.0/optional-descriptive-message`
426
427This step may well lead to a merge conflict (probably even, since we have changed the version number). If so, fix it and commit.
428
429
430#### Creating a hotfix branch
4311. Create branch from main: `git checkout -b hotfix/3.3.4/descriptive-message/creator's-name main`
4322. Update patch version: `./scripts/update_version.sh patch`
4333. Commit file changes with new version: `git commit -a -m "Updated version to 3.3.4"`
4344. Fix the bug and commit the fix: `git commit -m "Fixed critical production issue X"`
435
436#### Finishing a Hotfix Branch
437
438Finishing a hotfix branch involves merging the bugfix into both `main` and `staging`.
439
4401. Switch branch to main: `git checkout main`
4412. Merge hotfix into main: `git merge --no-ff hotfix/3.3.4/optional-descriptive-message`
4423. Tag new version: `git tag -a v3.3.4 -m "Releasing v3.3.4: descriptive comment about the hotfix"`
4434. Push changes to main: `git push origin main`
4445. Push tags to origin: `git push origin --tags`
4456. Switch branch to staging: `git checkout staging`
4467. Merge hotfix into staging: `git merge --no-ff hotfix/3.3.4/descriptive-message/creator's-name`
4478. Push changes to origin/staging: `git push origin staging`
4489. Delete hotfix branch: `git branch -d hotfix/3.3.4/optional-descriptive-message`
449
450The one exception to the rule here is that, **when a release branch currently exists, the hotfix changes need to be merged into that release branch, instead of** `staging`. Back-merging the bugfix into the __release__ branch will eventually result in the bugfix being merged into `develop` too, when the release branch is finished. (If work in develop immediately requires this bugfix and cannot wait for the release branch to be finished, you may safely merge the bugfix into develop now already as well.)
451
452Finally, we remove the temporary branch:
453
454- `git branch -d hotfix/3.3.4/optional-descriptive-message`
455## Continuous Integration (CI) and Continuous Deployment (CD)
456
457Continuous Integration (CI) is a software development practice where members of a team integrate their work frequently. Each integration is verified by an automated build and test process to detect integration errors as quickly as possible. 
458
459Continuous Deployment (CD) is a software engineering approach in which software functionalities are delivered frequently through automated deployments.
460
461- **CircleCI job**: Create jobs in CircleCI to automate the merging of staging into main and release version (needed to release code) and building and testing Bittensor (needed to merge PRs).
462
463> It is highly recommended to set up your own circleci pipeline with your subnet
464
465## Versioning and Release Notes
466
467Semantic versioning helps keep track of the different versions of the software. When code is merged into main, generate a new version. 
468
469Release notes provide documentation for each version released to the users, highlighting the new features, improvements, and bug fixes. When merged into main, generate GitHub release and release notes.
470
471## Pending Tasks
472
473Follow these steps when you are contributing to the bittensor subnet:
474
475- Determine if main and staging are different
476- Determine what is in staging that is not merged yet
477    - Document not released developments
478    - When merged into staging, generate information about what's merged into staging but not released.
479    - When merged into main, generate GitHub release and release notes.
480- CircleCI jobs 
481    - Merge staging into main and release version (needed to release code)
482    - Build and Test Bittensor (needed to merge PRs)
483
484This document can be improved as the Bittensor project continues to develop and change.
485
486
487
488---
489File: /contrib/STYLE.md
490---
491
492# Style Guide
493
494A project’s long-term success rests (among other things) on its maintainability, and a maintainer has few tools more powerful than his or her project’s log. It’s worth taking the time to learn how to care for one properly. What may be a hassle at first soon becomes habit, and eventually a source of pride and productivity for all involved.
495
496Most programming languages have well-established conventions as to what constitutes idiomatic style, i.e. naming, formatting and so on. There are variations on these conventions, of course, but most developers agree that picking one and sticking to it is far better than the chaos that ensues when everybody does their own thing.
497
498# Table of Contents
4991. [Code Style](#code-style)
5002. [Naming Conventions](#naming-conventions)
5013. [Git Commit Style](#git-commit-style)
5024. [The Six Rules of a Great Commit](#the-six-rules-of-a-great-commit)
503   - [1. Atomic Commits](#1-atomic-commits)
504   - [2. Separate Subject from Body with a Blank Line](#2-separate-subject-from-body-with-a-blank-line)
505   - [3. Limit the Subject Line to 50 Characters](#3-limit-the-subject-line-to-50-characters)
506   - [4. Use the Imperative Mood in the Subject Line](#4-use-the-imperative-mood-in-the-subject-line)
507   - [5. Wrap the Body at 72 Characters](#5-wrap-the-body-at-72-characters)
508   - [6. Use the Body to Explain What and Why vs. How](#6-use-the-body-to-explain-what-and-why-vs-how)
5095. [Tools Worth Mentioning](#tools-worth-mentioning)
510   - [Using `--fixup`](#using---fixup)
511   - [Interactive Rebase](#interactive-rebase)
5126. [Pull Request and Squashing Commits Caveats](#pull-request-and-squashing-commits-caveats)
513
514
515### Code style
516
517#### General Style
518Python's official style guide is PEP 8, which provides conventions for writing code for the main Python distribution. Here are some key points:
519
520- `Indentation:` Use 4 spaces per indentation level.
521
522- `Line Length:` Limit all lines to a maximum of 79 characters.
523
524- `Blank Lines:` Surround top-level function and class definitions with two blank lines. Method definitions inside a class are surrounded by a single blank line.
525
526- `Imports:` Imports should usually be on separate lines and should be grouped in the following order:
527
528    - Standard library imports.
529    - Related third party imports.
530    - Local application/library specific imports.
531- `Whitespace:` Avoid extraneous whitespace in the following situations:
532
533    - Immediately inside parentheses, brackets or braces.
534    - Immediately before a comma, semicolon, or colon.
535    - Immediately before the open parenthesis that starts the argument list of a function call.
536- `Comments:` Comments should be complete sentences and should be used to clarify code and are not a substitute for poorly written code.
537
538#### For Python
539
540- `List Comprehensions:` Use list comprehensions for concise and readable creation of lists.
541
542- `Generators:` Use generators when dealing with large amounts of data to save memory.
543
544- `Context Managers:` Use context managers (with statement) for resource management.
545
546- `String Formatting:` Use f-strings for formatting strings in Python 3.6 and above.
547
548- `Error Handling:` Use exceptions for error handling whenever possible.
549
550#### More details
551
552Use `black` to format your python code before committing for consistency across such a large pool of contributors. Black's code [style](https://black.readthedocs.io/en/stable/the_black_code_style/current_style.html#code-style) ensures consistent and opinionated code formatting. It automatically formats your Python code according to the Black style guide, enhancing code readability and maintainability.
553
554Key Features of Black:
555
556    Consistency: Black enforces a single, consistent coding style across your project, eliminating style debates and allowing developers to focus on code logic.
557
558    Readability: By applying a standard formatting style, Black improves code readability, making it easier to understand and collaborate on projects.
559
560    Automation: Black automates the code formatting process, saving time and effort. It eliminates the need for manual formatting and reduces the likelihood of inconsistencies.
561
562### Naming Conventions
563
564- `Classes:` Class names should normally use the CapWords Convention.
565- `Functions and Variables:` Function names should be lowercase, with words separated by underscores as necessary to improve readability. Variable names follow the same convention as function names.
566
567- `Constants:` Constants are usually defined on a module level and written in all capital letters with underscores separating words.
568
569- `Non-public Methods and Instance Variables:` Use a single leading underscore (_). This is a weak "internal use" indicator.
570
571- `Strongly "private" methods and variables:` Use a double leading underscore (__). This triggers name mangling in Python.
572
573
574### Git commit style
575
576Here’s a model Git commit message when contributing:
577```
578Summarize changes in around 50 characters or less
579
580More detailed explanatory text, if necessary. Wrap it to about 72
581characters or so. In some contexts, the first line is treated as the
582subject of the commit and the rest of the text as the body. The
583blank line separating the summary from the body is critical (unless
584you omit the body entirely); various tools like `log`, `shortlog`
585and `rebase` can get confused if you run the two together.
586
587Explain the problem that this commit is solving. Focus on why you
588are making this change as opposed to how (the code explains that).
589Are there side effects or other unintuitive consequences of this
590change? Here's the place to explain them.
591
592Further paragraphs come after blank lines.
593
594 - Bullet points are okay, too
595
596 - Typically a hyphen or asterisk is used for the bullet, preceded
597   by a single space, with blank lines in between, but conventions
598   vary here
599
600If you use an issue tracker, put references to them at the bottom,
601like this:
602
603Resolves: #123
604See also: #456, #789
605```
606
607
608## The six rules of a great commit.
609
610#### 1. Atomic Commits
611An “atomic” change revolves around one task or one fix.
612
613Atomic Approach
614 - Commit each fix or task as a separate change
615 - Only commit when a block of work is complete
616 - Commit each layout change separately
617 - Joint commit for layout file, code behind file, and additional resources
618
619Benefits
620
621- Easy to roll back without affecting other changes
622- Easy to make other changes on the fly
623- Easy to merge features to other branches
624
625#### Avoid trivial commit messages
626
627Commit messages like "fix", "fix2", or "fix3" don't provide any context or clear understanding of what changes the commit introduces. Here are some examples of good vs. bad commit messages:
628
629**Bad Commit Message:** 
630
631    $ git commit -m "fix"
632
633**Good Commit Message:**
634
635    $ git commit -m "Fix typo in README file"
636
637> **Caveat**: When working with new features, an atomic commit will often consist of multiple files, since a layout file, code behind file, and additional resources may have been added/modified. You don’t want to commit all of these separately, because if you had to roll back the application to a state before the feature was added, it would involve multiple commit entries, and that can get confusing
638
639#### 2. Separate subject from body with a blank line
640
641Not every commit requires both a subject and a body. Sometimes a single line is fine, especially when the change is so simple that no further context is necessary. 
642
643For example:
644
645    Fix typo in introduction to user guide
646
647Nothing more need be said; if the reader wonders what the typo was, she can simply take a look at the change itself, i.e. use     git show or git diff or git log -p.
648
649If you’re committing something like this at the command line, it’s easy to use the -m option to git commit:
650
651    $ git commit -m"Fix typo in introduction to user guide"
652
653However, when a commit merits a bit of explanation and context, you need to write a body. For example:
654
655    Derezz the master control program
656
657    MCP turned out to be evil and had become intent on world domination.
658    This commit throws Tron's disc into MCP (causing its deresolution)
659    and turns it back into a chess game.
660
661Commit messages with bodies are not so easy to write with the -m option. You’re better off writing the message in a proper text editor. [See Pro Git](https://git-scm.com/book/en/v2/Customizing-Git-Git-Configuration).
662
663In any case, the separation of subject from body pays off when browsing the log. Here’s the full log entry:
664
665    $ git log
666    commit 42e769bdf4894310333942ffc5a15151222a87be
667    Author: Kevin Flynn <[email protected]>
668    Date:   Fri Jan 01 00:00:00 1982 -0200
669    
670     Derezz the master control program
671    
672     MCP turned out to be evil and had become intent on world domination.
673     This commit throws Tron's disc into MCP (causing its deresolution)
674     and turns it back into a chess game.
675
676
677#### 3. Limit the subject line to 50 characters
67850 characters is not a hard limit, just a rule of thumb. Keeping subject lines at this length ensures that they are readable, and forces the author to think for a moment about the most concise way to explain what’s going on.
679
680GitHub’s UI is fully aware of these conventions. It will warn you if you go past the 50 character limit. Git will truncate any subject line longer than 72 characters with an ellipsis, thus keeping it to 50 is best practice.
681
682#### 4. Use the imperative mood in the subject line
683Imperative mood just means “spoken or written as if giving a command or instruction”. A few examples:
684
685    Clean your room
686    Close the door
687    Take out the trash
688
689Each of the seven rules you’re reading about right now are written in the imperative (“Wrap the body at 72 characters”, etc.).
690
691The imperative can sound a little rude; that’s why we don’t often use it. But it’s perfect for Git commit subject lines. One reason for this is that Git itself uses the imperative whenever it creates a commit on your behalf.
692
693For example, the default message created when using git merge reads:
694
695    Merge branch 'myfeature'
696
697And when using git revert:
698
699    Revert "Add the thing with the stuff"
700
701    This reverts commit cc87791524aedd593cff5a74532befe7ab69ce9d.
702
703Or when clicking the “Merge” button on a GitHub pull request:
704
705    Merge pull request #123 from someuser/somebranch
706
707So when you write your commit messages in the imperative, you’re following Git’s own built-in conventions. For example:
708
709    Refactor subsystem X for readability
710    Update getting started documentation
711    Remove deprecated methods
712    Release version 1.0.0
713
714Writing this way can be a little awkward at first. We’re more used to speaking in the indicative mood, which is all about reporting facts. That’s why commit messages often end up reading like this:
715
716    Fixed bug with Y
717    Changing behavior of X
718
719And sometimes commit messages get written as a description of their contents:
720
721    More fixes for broken stuff
722    Sweet new API methods
723
724To remove any confusion, here’s a simple rule to get it right every time.
725
726**A properly formed Git commit subject line should always be able to complete the following sentence:**
727
728    If applied, this commit will <your subject line here>
729
730For example:
731
732    If applied, this commit will refactor subsystem X for readability
733    If applied, this commit will update getting started documentation
734    If applied, this commit will remove deprecated methods
735    If applied, this commit will release version 1.0.0
736    If applied, this commit will merge pull request #123 from user/branch
737
738#### 5. Wrap the body at 72 characters
739Git never wraps text automatically. When you write the body of a commit message, you must mind its right margin, and wrap text manually.
740
741The recommendation is to do this at 72 characters, so that Git has plenty of room to indent text while still keeping everything under 80 characters overall.
742
743A good text editor can help here. It’s easy to configure Vim, for example, to wrap text at 72 characters when you’re writing a Git commit.
744
745#### 6. Use the body to explain what and why vs. how
746This [commit](https://github.com/bitcoin/bitcoin/commit/eb0b56b19017ab5c16c745e6da39c53126924ed6) from Bitcoin Core is a great example of explaining what changed and why:
747
748```
749commit eb0b56b19017ab5c16c745e6da39c53126924ed6
750Author: Pieter Wuille <[email protected]>
751Date:   Fri Aug 1 22:57:55 2014 +0200
752
753   Simplify serialize.h's exception handling
754
755   Remove the 'state' and 'exceptmask' from serialize.h's stream
756   implementations, as well as related methods.
757
758   As exceptmask always included 'failbit', and setstate was always
759   called with bits = failbit, all it did was immediately raise an
760   exception. Get rid of those variables, and replace the setstate
761   with direct exception throwing (which also removes some dead
762   code).
763
764   As a result, good() is never reached after a failure (there are
765   only 2 calls, one of which is in tests), and can just be replaced
766   by !eof().
767
768   fail(), clear(n) and exceptions() are just never called. Delete
769   them.
770```
771
772Take a look at the [full diff](https://github.com/bitcoin/bitcoin/commit/eb0b56b19017ab5c16c745e6da39c53126924ed6) and just think how much time the author is saving fellow and future committers by taking the time to provide this context here and now. If he didn’t, it would probably be lost forever.
773
774In most cases, you can leave out details about how a change has been made. Code is generally self-explanatory in this regard (and if the code is so complex that it needs to be explained in prose, that’s what source comments are for). Just focus on making clear the reasons why you made the change in the first place—the way things worked before the change (and what was wrong with that), the way they work now, and why you decided to solve it the way you did.
775
776The future maintainer that thanks you may be yourself!
777
778
779
780#### Tools worth mentioning
781
782##### Using `--fixup`
783
784If you've made a commit and then realize you've missed something or made a minor mistake, you can use the `--fixup` option. 
785
786For example, suppose you've made a commit with a hash `9fceb02`. Later, you realize you've left a debug statement in your code. Instead of making a new commit titled "remove debug statement" or "fix", you can do the following:
787
788    $ git commit --fixup 9fceb02
789
790This will create a new commit to fix the issue, with a message like "fixup! The original commit message".
791
792##### Interactive Rebase
793
794Interactive rebase, or `rebase -i`, can be used to squash these fixup commits into the original commits they're fixing, which cleans up your commit history. You can use the `autosquash` option to automatically squash any commits marked as "fixup" into their target commits.
795
796For example:
797
798    $ git rebase -i --autosquash HEAD~5
799
800This command starts an interactive rebase for the last 5 commits (`HEAD~5`). Any commits marked as "fixup" will be automatically moved to squash with their target commits.
801
802The benefit of using `--fixup` and interactive rebase is that it keeps your commit history clean and readable. It groups fixes with the commits they are related to, rather than having a separate "fix" commit that might not make sense to other developers (or even to you) in the future.
803
804
805---
806
807#### Pull Request and Squashing Commits Caveats
808
809While atomic commits are great for development and for understanding the changes within the branch, the commit history can get messy when merging to the main branch. To keep a cleaner and more understandable commit history in our main branch, we encourage squashing all the commits of a PR into one when merging.
810
811This single commit should provide an overview of the changes that the PR introduced. It should follow the guidelines for atomic commits (an atomic commit is complete, self-contained, and understandable) but on the scale of the entire feature, task, or fix that the PR addresses. This approach combines the benefits of atomic commits during development with a clean commit history in our main branch.
812
813Here is how you can squash commits:
814
815```bash
816git rebase -i HEAD~n
817```
818
819where `n` is the number of commits to squash. After running the command, replace `pick` with `squash` for the commits you want to squash into the previous commit. This will combine the commits and allow you to write a new commit message.
820
821In this context, an atomic commit message could look like:
822
823```
824Add feature X
825
826This commit introduces feature X which does A, B, and C. It adds 
827new files for layout, updates the code behind the file, and introduces
828new resources. This change is important because it allows users to 
829perform task Y more efficiently. 
830
831It includes:
832- Creation of new layout file
833- Updates in the code-behind file
834- Addition of new resources
835
836Resolves: #123
837```
838
839In your PRs, remember to detail what the PR is introducing or fixing. This will be helpful for reviewers to understand the context and the reason behind the changes. 
840
841
842
843---
844File: /datura/datura/consumers/base.py
845---
846
847import abc
848import logging
849
850from fastapi import WebSocket, WebSocketDisconnect
851
852from ..requests.base import BaseRequest
853
854logger = logging.getLogger(__name__)
855
856
857class BaseConsumer(abc.ABC):
858    def __init__(self, websocket: WebSocket):
859        self.websocket = websocket
860
861    @abc.abstractmethod
862    def accepted_request_type(self) -> type[BaseRequest]:
863        pass
864
865    async def connect(self):
866        await self.websocket.accept()
867
868    async def receive_message(self) -> BaseRequest:
869        data = await self.websocket.receive_text()
870        return self.accepted_request_type().parse(data)
871
872    async def send_message(self, msg: BaseRequest):
873        await self.websocket.send_text(msg.json())
874
875    async def disconnect(self):
876        try:
877            await self.websocket.close()
878        except Exception:
879            pass
880
881    @abc.abstractmethod
882    async def handle_message(self, data: BaseRequest):
883        raise NotImplementedError
884
885    async def handle(self):
886        # await self.connect()
887        try:
888            while True:
889                data: BaseRequest = await self.receive_message()
890                await self.handle_message(data)
891        except WebSocketDisconnect as ex:
892            logger.info("Websocket connection closed, e: %s", str(ex))
893            await self.disconnect()
894        except Exception as ex:
895            logger.info("Handling message error: %s", str(ex))
896            await self.disconnect()
897
898
899
900---
901File: /datura/datura/errors/__init__.py
902---
903
904
905
906
907---
908File: /datura/datura/errors/protocol.py
909---
910
911from datura.requests.base import BaseRequest
912
913
914class UnsupportedMessageReceived(Exception):
915    def __init__(self, msg: BaseRequest):
916        self.msg = msg
917
918    def __str__(self):
919        return f"{type(self).__name__}: {self.msg.json()}"
920
921    __repr__ = __str__
922
923
924
925---
926File: /datura/datura/requests/base.py
927---
928
929import abc
930import enum
931import json
932
933import pydantic
934
935
936class ValidationError(Exception):
937    def __init__(self, msg):
938        self.msg = msg
939
940    @classmethod
941    def from_json_decode_error(cls, exc: json.JSONDecodeError):
942        return cls(exc.args[0])
943
944    @classmethod
945    def from_pydantic_validation_error(cls, exc: pydantic.ValidationError):
946        return cls(json.dumps(exc.json()))
947
948    def __repr__(self):
949        return f"{type(self).__name__}({self.msg})"
950
951
952def all_subclasses(cls: type):
953    for subcls in cls.__subclasses__():
954        yield subcls
955        yield from all_subclasses(subcls)
956
957
958base_class_to_request_type_mapping = {}
959
960
961class BaseRequest(pydantic.BaseModel, abc.ABC):
962    message_type: enum.Enum
963
964    @classmethod
965    def type_to_model(cls, type_: enum.Enum) -> type["BaseRequest"]:
966        mapping = base_class_to_request_type_mapping.get(cls)
967        if not mapping:
968            mapping = {}
969            for klass in all_subclasses(cls):
970                if not (message_type := klass.__fields__.get("message_type")):
971                    continue
972                if not message_type.default:
973                    continue
974                mapping[message_type.default] = klass
975            base_class_to_request_type_mapping[cls] = mapping
976
977        return mapping[type_]
978
979    @classmethod
980    def parse(cls, str_: str):
981        try:
982            json_ = json.loads(str_)
983        except json.JSONDecodeError as exc:
984            raise ValidationError.from_json_decode_error(exc)
985
986        try:
987            base_model_object = cls.parse_obj(json_)
988        except pydantic.ValidationError as exc:
989            raise ValidationError.from_pydantic_validation_error(exc)
990
991        target_model = cls.type_to_model(base_model_object.message_type)
992
993        try:
994            return target_model.parse_obj(json_)
995        except pydantic.ValidationError as exc:
996            raise ValidationError.from_pydantic_validation_error(exc)
997
998
999
1000---
1001File: /datura/datura/requests/miner_requests.py
1002---
1003
1004import enum
1005
1006import pydantic
1007from datura.requests.base import BaseRequest
1008
1009
1010class RequestType(enum.Enum):
1011    GenericError = "GenericError"
1012    AcceptJobRequest = "AcceptJobRequest"
1013    DeclineJobRequest = "DeclineJobRequest"
1014    AcceptSSHKeyRequest = "AcceptSSHKeyRequest"
1015    FailedRequest = "FailedRequest"
1016    UnAuthorizedRequest = "UnAuthorizedRequest"
1017    SSHKeyRemoved = "SSHKeyRemoved"
1018
1019
1020class Executor(pydantic.BaseModel):
1021    uuid: str
1022    address: str
1023    port: int
1024
1025
1026class BaseMinerRequest(BaseRequest):
1027    message_type: RequestType
1028
1029
1030class GenericError(BaseMinerRequest):
1031    message_type: RequestType = RequestType.GenericError
1032    details: str | None = None
1033
1034
1035class AcceptJobRequest(BaseMinerRequest):
1036    message_type: RequestType = RequestType.AcceptJobRequest
1037    executors: list[Executor]
1038
1039
1040class DeclineJobRequest(BaseMinerRequest):
1041    message_type: RequestType = RequestType.DeclineJobRequest
1042
1043
1044class ExecutorSSHInfo(pydantic.BaseModel):
1045    uuid: str
1046    address: str
1047    port: int
1048    ssh_username: str
1049    ssh_port: int
1050    python_path: str
1051    root_dir: str
1052    port_range: str | None = None
1053    port_mappings: str | None = None
1054
1055class AcceptSSHKeyRequest(BaseMinerRequest):
1056    message_type: RequestType = RequestType.AcceptSSHKeyRequest
1057    executors: list[ExecutorSSHInfo]
1058
1059
1060class SSHKeyRemoved(BaseMinerRequest):
1061    message_type: RequestType = RequestType.SSHKeyRemoved
1062
1063
1064class FailedRequest(BaseMinerRequest):
1065    message_type: RequestType = RequestType.FailedRequest
1066    details: str | None = None
1067
1068
1069class UnAuthorizedRequest(FailedRequest):
1070    message_type: RequestType = RequestType.UnAuthorizedRequest
1071
1072
1073
1074---
1075File: /datura/datura/requests/validator_requests.py
1076---
1077
1078import enum
1079import json
1080from typing import Optional
1081
1082import pydantic
1083from datura.requests.base import BaseRequest
1084
1085
1086class RequestType(enum.Enum):
1087    AuthenticateRequest = "AuthenticateRequest"
1088    SSHPubKeySubmitRequest = "SSHPubKeySubmitRequest"
1089    SSHPubKeyRemoveRequest = "SSHPubKeyRemoveRequest"
1090
1091
1092class BaseValidatorRequest(BaseRequest):
1093    message_type: RequestType
1094
1095
1096class AuthenticationPayload(pydantic.BaseModel):
1097    validator_hotkey: str
1098    miner_hotkey: str
1099    timestamp: int
1100
1101    def blob_for_signing(self):
1102        instance_dict = self.model_dump()
1103        return json.dumps(instance_dict, sort_keys=True)
1104
1105
1106class AuthenticateRequest(BaseValidatorRequest):
1107    message_type: RequestType = RequestType.AuthenticateRequest
1108    payload: AuthenticationPayload
1109    signature: str
1110
1111    def blob_for_signing(self):
1112        return self.payload.blob_for_signing()
1113
1114
1115class SSHPubKeySubmitRequest(BaseValidatorRequest):
1116    message_type: RequestType = RequestType.SSHPubKeySubmitRequest
1117    public_key: bytes
1118    executor_id: Optional[str] = None
1119
1120
1121class SSHPubKeyRemoveRequest(BaseValidatorRequest):
1122    message_type: RequestType = RequestType.SSHPubKeyRemoveRequest
1123    public_key: bytes
1124    executor_id: Optional[str] = None
1125
1126
1127
1128---
1129File: /datura/datura/__init__.py
1130---
1131
1132
1133
1134
1135---
1136File: /datura/tests/__init__.py
1137---
1138
1139
1140
1141
1142---
1143File: /datura/README.md
1144---
1145
1146# datura
1147
1148
1149
1150---
1151File: /neurons/executor/src/core/__init__.py
1152---
1153
1154
1155
1156
1157---
1158File: /neurons/executor/src/core/config.py
1159---
1160
1161from typing import Optional
1162from pydantic import Field
1163from pydantic_settings import BaseSettings, SettingsConfigDict
1164
1165
1166class Settings(BaseSettings):
1167    model_config = SettingsConfigDict(env_file=".env", extra="ignore")
1168    PROJECT_NAME: str = "compute-subnet-executor"
1169
1170    INTERNAL_PORT: int = Field(env="INTERNAL_PORT", default=8001)
1171    SSH_PORT: int = Field(env="SSH_PORT", default=2200)
1172    SSH_PUBLIC_PORT: Optional[int] = Field(env="SSH_PUBLIC_PORT", default=None)
1173
1174    MINER_HOTKEY_SS58_ADDRESS: str = Field(env="MINER_HOTKEY_SS58_ADDRESS")
1175
1176    RENTING_PORT_RANGE: Optional[str] = Field(env="RENTING_PORT_RANGE", default=None)
1177    RENTING_PORT_MAPPINGS: Optional[str] = Field(env="RENTING_PORT_MAPPINGS", default=None)
1178
1179    ENV: str = Field(env="ENV", default="dev")
1180
1181
1182settings = Settings()
1183
1184
1185
1186---
1187File: /neurons/executor/src/core/logger.py
1188---
1189
1190import logging
1191import json
1192
1193
1194def get_logger(name: str):
1195    logger = logging.getLogger(name)
1196    handler = logging.StreamHandler()
1197    formatter = logging.Formatter(
1198        "Name: %(name)s | Time: %(asctime)s | Level: %(levelname)s | File: %(filename)s | Function: %(funcName)s | Line: %(lineno)s | Process: %(process)d | Message: %(message)s"
1199    )
1200    handler.setFormatter(formatter)
1201    logger.addHandler(handler)
1202    logger.setLevel(logging.INFO)
1203    return logger
1204
1205
1206class StructuredMessage:
1207    def __init__(self, message, extra: dict):
1208        self.message = message
1209        self.extra = extra
1210
1211    def __str__(self):
1212        return "%s >>> %s" % (self.message, json.dumps(self.extra))  # noqa
1213
1214
1215_m = StructuredMessage
1216
1217
1218
1219---
1220File: /neurons/executor/src/middlewares/__init__.py
1221---
1222
1223
1224
1225
1226---
1227File: /neurons/executor/src/middlewares/miner.py
1228---
1229
1230import bittensor
1231from fastapi.responses import JSONResponse
1232from payloads.miner import MinerAuthPayload
1233from pydantic import ValidationError
1234from starlette.middleware.base import BaseHTTPMiddleware
1235
1236from core.config import settings
1237from core.logger import _m, get_logger
1238
1239logger = get_logger(__name__)
1240
1241
1242class MinerMiddleware(BaseHTTPMiddleware):
1243    def __init__(self, app) -> None:
1244        super().__init__(app)
1245
1246    async def dispatch(self, request, call_next):
1247        try:
1248            body_bytes = await request.body()
1249            miner_ip = request.client.host
1250            default_extra = {"miner_ip": miner_ip}
1251
1252            # Parse it into the Pydantic model
1253            payload = MinerAuthPayload.model_validate_json(body_bytes)
1254
1255            logger.info(_m("miner ip", extra=default_extra))
1256
1257            keypair = bittensor.Keypair(ss58_address=settings.MINER_HOTKEY_SS58_ADDRESS)
1258            if not keypair.verify(payload.public_key, payload.signature):
1259                logger.error(
1260                    _m(
1261                        "Auth failed. incorrect signature",
1262                        extra={
1263                            **default_extra,
1264                            "signature": payload.signature,
1265                            "public_key": payload.public_key,
1266                            "miner_hotkey": settings.MINER_HOTKEY_SS58_ADDRESS,
1267                        },
1268                    )
1269                )
1270                return JSONResponse(status_code=401, content="Unauthorized")
1271
1272            response = await call_next(request)
1273            return response
1274        except ValidationError as e:
1275            # Handle validation error if needed
1276            error_message = str(_m("Validation Error", extra={"errors": str(e.errors())}))
1277            logger.error(error_message)
1278            return JSONResponse(status_code=422, content=error_message)
1279
1280
1281
1282---
1283File: /neurons/executor/src/payloads/__init__.py
1284---
1285
1286
1287
1288
1289---
1290File: /neurons/executor/src/payloads/miner.py
1291---
1292
1293from pydantic import BaseModel
1294
1295
1296class MinerAuthPayload(BaseModel):
1297    public_key: str
1298    signature: str
1299
1300
1301
1302---
1303File: /neurons/executor/src/routes/__init__.py
1304---
1305
1306
1307
1308
1309---
1310File: /neurons/executor/src/routes/apis.py
1311---
1312
1313from typing import Annotated
1314
1315from fastapi import APIRouter, Depends
1316from services.miner_service import MinerService
1317
1318from payloads.miner import MinerAuthPayload
1319
1320apis_router = APIRouter()
1321
1322
1323@apis_router.post("/upload_ssh_key")
1324async def upload_ssh_key(
1325    payload: MinerAuthPayload, miner_service: Annotated[MinerService, Depends(MinerService)]
1326):
1327    return await miner_service.upload_ssh_key(payload)
1328
1329
1330@apis_router.post("/remove_ssh_key")
1331async def remove_ssh_key(
1332    payload: MinerAuthPayload, miner_service: Annotated[MinerService, Depends(MinerService)]
1333):
1334    return await miner_service.remove_ssh_key(payload)
1335
1336
1337
1338---
1339File: /neurons/executor/src/services/miner_service.py
1340---
1341
1342import asyncio
1343import sys
1344import logging
1345from pathlib import Path
1346
1347from typing import Annotated
1348from fastapi import Depends
1349
1350from core.config import settings
1351from services.ssh_service import SSHService
1352
1353from payloads.miner import MinerAuthPayload
1354
1355logger = logging.getLogger(__name__)
1356
1357
1358class MinerService:
1359    def __init__(
1360        self,
1361        ssh_service: Annotated[SSHService, Depends(SSHService)],
1362    ):
1363        self.ssh_service = ssh_service
1364
1365    async def upload_ssh_key(self, paylod: MinerAuthPayload):
1366        self.ssh_service.add_pubkey_to_host(paylod.public_key)
1367
1368        return {
1369            "ssh_username": self.ssh_service.get_current_os_user(),
1370            "ssh_port": settings.SSH_PUBLIC_PORT or settings.SSH_PORT,
1371            "python_path": sys.executable,
1372            "root_dir": str(Path(__file__).resolve().parents[2]),
1373            "port_range": settings.RENTING_PORT_RANGE,
1374            "port_mappings": settings.RENTING_PORT_MAPPINGS
1375        }
1376
1377    async def remove_ssh_key(self, paylod: MinerAuthPayload):
1378        return self.ssh_service.remove_pubkey_from_host(paylod.public_key)
1379
1380
1381
1382---
1383File: /neurons/executor/src/services/ssh_service.py
1384---
1385
1386import getpass
1387import os
1388
1389
1390class SSHService:
1391    def add_pubkey_to_host(self, pub_key: str):
1392        with open(os.path.expanduser("~/.ssh/authorized_keys"), "a") as file:
1393            file.write(pub_key + "\n")
1394
1395    def remove_pubkey_from_host(self, pub_key: str):
1396        authorized_keys_path = os.path.expanduser("~/.ssh/authorized_keys")
1397
1398        with open(authorized_keys_path, "r") as file:
1399            lines = file.readlines()
1400
1401        with open(authorized_keys_path, "w") as file:
1402            for line in lines:
1403                if line.strip() != pub_key:
1404                    file.write(line)
1405
1406    def get_current_os_user(self) -> str:
1407        return getpass.getuser()
1408
1409
1410
1411---
1412File: /neurons/executor/src/executor.py
1413---
1414
1415import logging
1416
1417from fastapi import FastAPI
1418import uvicorn
1419
1420from core.config import settings
1421from middlewares.miner import MinerMiddleware
1422from routes.apis import apis_router
1423
1424# Set up logging
1425logging.basicConfig(level=logging.INFO)
1426
1427app = FastAPI(
1428    title=settings.PROJECT_NAME,
1429)
1430
1431app.add_middleware(MinerMiddleware)
1432app.include_router(apis_router)
1433
1434reload = True if settings.ENV == "dev" else False
1435
1436if __name__ == "__main__":
1437    uvicorn.run("executor:app", host="0.0.0.0", port=settings.INTERNAL_PORT, reload=reload)
1438
1439
1440
1441---
1442File: /neurons/executor/src/gpus_utility.py
1443---
1444
1445import asyncio
1446import logging
1447import time
1448
1449import aiohttp
1450import click
1451import pynvml
1452import psutil
1453
1454logger = logging.getLogger(__name__)
1455logger.setLevel(logging.INFO)
1456
1457
1458class GPUMetricsTracker:
1459    def __init__(self, threshold_percent: float = 10.0):
1460        self.previous_metrics: dict[int, dict] = {}
1461        self.threshold = threshold_percent
1462
1463    def has_significant_change(self, gpu_id: int, util: float, mem_used: float) -> bool:
1464        if gpu_id not in self.previous_metrics:
1465            self.previous_metrics[gpu_id] = {"util": util, "mem_used": mem_used}
1466            return True
1467
1468        prev = self.previous_metrics[gpu_id]
1469        util_diff = abs(util - prev["util"])
1470        mem_diff_percent = abs(mem_used - prev["mem_used"]) / prev["mem_used"] * 100
1471
1472        if util_diff >= self.threshold or mem_diff_percent >= self.threshold:
1473            self.previous_metrics[gpu_id] = {"util": util, "mem_used": mem_used}
1474            return True
1475        return False
1476
1477
1478async def scrape_gpu_metrics(
1479    interval: int,
1480    program_id: str,
1481    signature: str,
1482    executor_id: str,
1483    validator_hotkey: str,
1484    compute_rest_app_url: str,
1485):
1486    try:
1487        pynvml.nvmlInit()
1488        device_count = pynvml.nvmlDeviceGetCount()
1489        if device_count == 0:
1490            logger.warning("No NVIDIA GPUs found in the system")
1491            return
1492    except pynvml.NVMLError as e:
1493        logger.error(f"Failed to initialize NVIDIA Management Library: {e}")
1494        logger.error(
1495            "This might be because no NVIDIA GPU is present or drivers are not properly installed"
1496        )
1497        return
1498
1499    http_url = f"{compute_rest_app_url}/validator/{validator_hotkey}/update-gpu-metrics"
1500    logger.info(f"Will send metrics to: {http_url}")
1501
1502    # Initialize the tracker
1503    tracker = GPUMetricsTracker(threshold_percent=10.0)
1504
1505    async with aiohttp.ClientSession() as session:
1506        logger.info(f"Scraping metrics for {device_count} GPUs...")
1507        try:
1508            while True:
1509                try:
1510                    gpu_utilization = []
1511                    should_send = False
1512
1513                    for i in range(device_count):
1514                        handle = pynvml.nvmlDeviceGetHandleByIndex(i)
1515
1516                        name = pynvml.nvmlDeviceGetName(handle)
1517                        if isinstance(name, bytes):
1518                            name = name.decode("utf-8")
1519
1520                        utilization = pynvml.nvmlDeviceGetUtilizationRates(handle)
1521                        memory = pynvml.nvmlDeviceGetMemoryInfo(handle)
1522
1523                        gpu_util = utilization.gpu
1524                        mem_used = memory.used
1525                        mem_total = memory.total
1526                        timestamp = time.strftime("%Y-%m-%d %H:%M:%S")
1527
1528                        # Check if there's a significant change for this GPU
1529                        if tracker.has_significant_change(i, gpu_util, mem_used):
1530                            should_send = True
1531                            logger.info(f"Significant change detected for GPU {i}")
1532
1533                        gpu_utilization.append(
1534                            {
1535                                "utilization_in_percent": gpu_util,
1536                                "memory_utilization_in_bytes": mem_used,
1537                                "memory_utilization_in_percent": round(mem_used / mem_total * 100, 1)
1538                            }
1539                        )
1540
1541                    # Get CPU, RAM, and Disk metrics using psutil
1542                    cpu_percent = psutil.cpu_percent(interval=1)
1543                    ram = psutil.virtual_memory()
1544                    disk = psutil.disk_usage('/')
1545                    
1546                    cpu_ram_utilization = {
1547                        "cpu_utilization_in_percent": cpu_percent,
1548                        "ram_utilization_in_bytes": ram.used,
1549                        "ram_utilization_in_percent": ram.percent
1550                    }
1551
1552                    disk_utilization = {
1553                        "disk_utilization_in_bytes": disk.used,
1554                        "disk_utilization_in_percent": disk.percent
1555                    }
1556                    
1557                    # Only send if there's a significant change in any GPU
1558                    if should_send:
1559                        payload = {
1560                            "gpu_utilization": gpu_utilization,
1561                            "cpu_ram_utilization": cpu_ram_utilization,
1562                            "disk_utilization": disk_utilization,
1563                            "timestamp": timestamp,
1564                            "program_id": program_id,
1565                            "signature": signature,
1566                            "executor_id": executor_id,
1567                        }
1568                        # Send HTTP POST request
1569                        async with session.post(http_url, json=payload) as response:
1570                            if response.status == 200:
1571                                logger.info("Successfully sent metrics to backend")
1572                            else:
1573                                logger.error(f"Failed to send metrics. Status: {response.status}")
1574                                text = await response.text()
1575                                logger.error(f"Response: {text}")
1576
1577                    await asyncio.sleep(interval)
1578
1579                except Exception as e:
1580                    logger.error(f"Error in main loop: {e}")
1581                    await asyncio.sleep(5)  # Wait before retrying
1582
1583        except KeyboardInterrupt:
1584            logger.info("Stopping GPU scraping...")
1585        finally:
1586            pynvml.nvmlShutdown()
1587
1588
1589@click.command()
1590@click.option("--program_id", prompt="Program ID", help="Program ID for monitoring")
1591@click.option("--signature", prompt="Signature", help="Signature for verification")
1592@click.option("--executor_id", prompt="Executor ID", help="Executor ID")
1593@click.option("--validator_hotkey", prompt="Validator Hotkey", help="Validator hotkey")
1594@click.option("--compute_rest_app_url", prompt="Compute-app Url", help="Compute-app Url")
1595@click.option("--interval", default=5, type=int, help="Scraping interval in seconds")
1596def main(
1597    interval: int,
1598    program_id: str,
1599    signature: str,
1600    executor_id: str,
1601    validator_hotkey: str,
1602    compute_rest_app_url: str,
1603):
1604    asyncio.run(
1605        scrape_gpu_metrics(
1606            interval, program_id, signature, executor_id, validator_hotkey, compute_rest_app_url
1607        )
1608    )
1609
1610
1611if __name__ == "__main__":
1612    main()
1613
1614
1615
1616---
1617File: /neurons/executor/docker_build.sh
1618---
1619
1620#!/bin/bash
1621set -eux -o pipefail
1622
1623IMAGE_NAME="daturaai/compute-subnet-executor:$TAG"
1624
1625docker build --build-context datura=../../datura -t $IMAGE_NAME .
1626
1627
1628---
1629File: /neurons/executor/docker_publish.sh
1630---
1631
1632#!/bin/bash
1633set -eux -o pipefail
1634
1635source ./docker_build.sh
1636
1637echo "$DOCKERHUB_PAT" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
1638docker push "$IMAGE_NAME"
1639
1640
1641---
1642File: /neurons/executor/docker_runner_build.sh
1643---
1644
1645#!/bin/bash
1646set -eux -o pipefail
1647
1648IMAGE_NAME="daturaai/compute-subnet-executor-runner:$TAG"
1649
1650docker build --file Dockerfile.runner -t $IMAGE_NAME .
1651
1652
1653---
1654File: /neurons/executor/docker_runner_publish.sh
1655---
1656
1657#!/bin/bash
1658set -eux -o pipefail
1659
1660source ./docker_runner_build.sh
1661
1662echo "$DOCKERHUB_PAT" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
1663docker push "$IMAGE_NAME"
1664
1665
1666---
1667File: /neurons/executor/entrypoint.sh
1668---
1669
1670#!/bin/sh
1671set -eu
1672
1673docker compose up --pull always --detach --wait --force-recreate
1674
1675# Clean docker images
1676docker image prune -f
1677
1678while true
1679do
1680    docker compose logs -f
1681    echo 'All containers died'
1682    sleep 10
1683done
1684
1685
1686
1687---
1688File: /neurons/executor/README.md
1689---
1690
1691# Executor
1692
1693## Setup project
1694### Requirements
1695* Ubuntu machine
1696* install [docker](https://docs.docker.com/engine/install/ubuntu/)
1697
1698
1699### Step 1: Clone project
1700
1701```
1702git clone https://github.com/Datura-ai/compute-subnet.git
1703```
1704
1705### Step 2: Install Required Tools
1706
1707Run following command to install required tools: 
1708```shell
1709cd compute-subnet && chmod +x scripts/install_executor_on_ubuntu.sh && scripts/install_executor_on_ubuntu.sh
1710```
1711
1712if you don't have sudo on your machine, run
1713```shell
1714sed -i 's/sudo //g' scripts/install_executor_on_ubuntu.sh
1715```
1716to remove sudo from the setup script commands
1717
1718### Step 3: Configure Docker for Nvidia
1719
1720Please follow [this](https://stackoverflow.com/questions/72932940/failed-to-initialize-nvml-unknown-error-in-docker-after-few-hours) to setup docker for nvidia properly 
1721
1722
1723### Step 4: Install and Run
1724
1725* Go to executor root
1726```shell
1727cd neurons/executor
1728```
1729
1730* Add .env in the project
1731```shell
1732cp .env.template .env
1733```
1734
1735Add the correct miner wallet hotkey for `MINER_HOTKEY_SS58_ADDRESS`.
1736You can change the ports for `INTERNAL_PORT`, `EXTERNAL_PORT`, `SSH_PORT` based on your need.
1737
1738- **INTERNAL_PORT**: internal port of your executor docker container
1739- **EXTERNAL_PORT**: external expose port of your executor docker container
1740- **SSH_PORT**: ssh port map into 22 of your executor docker container
1741- **SSH_PUBLIC_PORT**: [Optional] ssh public access port of your executor docker container. If `SSH_PUBLIC_PORT` is equal to `SSH_PORT` then you don't have to specify this port.
1742- **MINER_HOTKEY_SS58_ADDRESS**: the miner hotkey address
1743- **RENTING_PORT_RANGE**: The port range that are publicly accessible. This can be empty if all ports are open. Available formats are: 
1744  - Range Specification(`from-to`): Miners can specify a range of ports, such as 2000-2005. This means ports from 2000 to 2005 will be open for the validator to select.
1745  - Specific Ports(`port1,port2,port3`): Miners can specify individual ports, such as 2000,2001,2002. This means only ports 2000, 2001, and 2002 will be available for the validator.
1746  - Default Behavior: If no ports are specified, the validator will assume that all ports on the executor are available.
1747- **RENTING_PORT_MAPPINGS**: Internal, external port mappings. Use this env when you are using proxy in front of your executors and the internal port and external port can't be the same. You can ignore this env, if all ports are open or the internal and external ports are the same. example:
1748  - if internal port 46681 is mapped to 56681 external port and internal port 46682 is mapped to 56682 external port, then RENTING_PORT_MAPPINGS="[[46681, 56681], [46682, 56682]]"
1749
1750Note: Please use either **RENTING_PORT_RANGE** or **RENTING_PORT_MAPPINGS** and DO NOT use both of them if you have specific ports are available.
1751
1752
1753* Run project
1754```shell
1755docker compose up -d
1756```
1757
1758
1759
1760---
1761File: /neurons/executor/run.sh
1762---
1763
1764#!/bin/sh
1765set -eux -o pipefail
1766
1767# start ssh service
1768ssh-keygen -A
1769service ssh start
1770
1771# run fastapi app
1772python src/executor.py
1773
1774
1775---
1776File: /neurons/miners/migrations/versions/8e52603bd563_create_validator_model.py
1777---
1778
1779"""create validator model
1780
1781Revision ID: 8e52603bd563
1782Revises:
1783Create Date: 2024-07-15 10:47:41.596221
1784
1785"""
1786
1787from collections.abc import Sequence
1788
1789import sqlalchemy as sa
1790import sqlmodel
1791import sqlmodel.sql.sqltypes
1792from alembic import op
1793
1794# revision identifiers, used by Alembic.
1795revision: str = "8e52603bd563"
1796down_revision: str | None = None
1797branch_labels: str | Sequence[str] | None = None
1798depends_on: str | Sequence[str] | None = None
1799
1800
1801def upgrade() -> None:
1802    # ### commands auto generated by Alembic - please adjust! ###
1803    op.create_table(
1804        "validator",
1805        sa.Column('uuid', sa.Uuid(), nullable=False),
1806        sa.Column('validator_hotkey', sqlmodel.sql.sqltypes.AutoString(), nullable=False),
1807        sa.Column('active', sa.Boolean(), nullable=False),
1808        sa.PrimaryKeyConstraint('uuid'),
1809        sa.UniqueConstraint('validator_hotkey')
1810    )
1811    # ### end Alembic commands ###
1812
1813
1814def downgrade() -> None:
1815    # ### commands auto generated by Alembic - please adjust! ###
1816    op.drop_table("validator")
1817    # ### end Alembic commands ###
1818
1819
1820
1821---
1822File: /neurons/miners/migrations/versions/eb0b92cbc38e_add_executors_table.py
1823---
1824
1825"""Add executors table
1826
1827Revision ID: eb0b92cbc38e
1828Revises: 8e52603bd563
1829Create Date: 2024-09-06 06:56:04.990324
1830
1831"""
1832
1833from collections.abc import Sequence
1834
1835import sqlalchemy as sa
1836import sqlmodel
1837import sqlmodel.sql.sqltypes
1838from alembic import op
1839
1840# revision identifiers, used by Alembic.
1841revision: str = "eb0b92cbc38e"
1842down_revision: str | None = "8e52603bd563"
1843branch_labels: str | Sequence[str] | None = None
1844depends_on: str | Sequence[str] | None = None
1845
1846
1847def upgrade() -> None:
1848    # ### commands auto generated by Alembic - please adjust! ###
1849    op.create_table(
1850        "executor",
1851        sa.Column("uuid", sa.Uuid(), nullable=False),
1852        sa.Column("address", sqlmodel.sql.sqltypes.AutoString(), nullable=False),
1853        sa.Column("port", sa.Integer(), nullable=False),
1854        sa.Column("validator", sqlmodel.sql.sqltypes.AutoString(), nullable=False),
1855        sa.PrimaryKeyConstraint("uuid"),
1856        sa.UniqueConstraint("address", "port", name="unique_contraint_address_port"),
1857    )
1858    # ### end Alembic commands ###
1859
1860
1861def downgrade() -> None:
1862    # ### commands auto generated by Alembic - please adjust! ###
1863    op.drop_table("executor")
1864    # ### end Alembic commands ###
1865
1866
1867
1868---
1869File: /neurons/miners/migrations/env.py
1870---
1871
1872import os
1873from logging.config import fileConfig
1874from pathlib import Path
1875
1876from alembic import context
1877from dotenv import load_dotenv
1878from sqlalchemy import engine_from_config, pool
1879from sqlmodel import SQLModel
1880
1881from models.executor import *  # noqa
1882from models.validator import *  # noqa
1883
1884# this is the Alembic Config object, which provides
1885# access to the values within the .ini file in use.
1886config = context.config
1887
1888# Interpret the config file for Python logging.
1889# This line sets up loggers basically.
1890if config.config_file_name is not None:
1891    fileConfig(config.config_file_name)
1892
1893# add your model's MetaData object here
1894# for 'autogenerate' support
1895# from myapp import mymodel
1896# target_metadata = mymodel.Base.metadata
1897
1898target_metadata = SQLModel.metadata
1899
1900# other values from the config, defined by the needs of env.py,
1901# can be acquired:
1902# my_important_option = config.get_main_option("my_important_option")
1903# ... etc.
1904
1905current_dir = Path(__file__).parent
1906
1907load_dotenv(str(current_dir / ".." / ".env"))
1908
1909
1910def get_url():
1911    url = os.getenv("SQLALCHEMY_DATABASE_URI")
1912    return url
1913
1914
1915def run_migrations_offline() -> None:
1916    """Run migrations in 'offline' mode.
1917
1918    This configures the context with just a URL
1919    and not an Engine, though an Engine is acceptable
1920    here as well.  By skipping the Engine creation
1921    we don't even need a DBAPI to be available.
1922
1923    Calls to context.execute() here emit the given string to the
1924    script output.
1925
1926    """
1927    url = get_url()
1928    context.configure(
1929        url=url,
1930        target_metadata=target_metadata,
1931        literal_binds=True,
1932        dialect_opts={"paramstyle": "named"},
1933    )
1934
1935    with context.begin_transaction():
1936        context.run_migrations()
1937
1938
1939def run_migrations_online() -> None:
1940    """Run migrations in 'online' mode.
1941
1942    In this scenario we need to create an Engine
1943    and associate a connection with the context.
1944
1945    """
1946    configuration = config.get_section(config.config_ini_section)
1947    configuration["sqlalchemy.url"] = get_url()
1948    connectable = engine_from_config(
1949        configuration,
1950        prefix="sqlalchemy.",
1951        poolclass=pool.NullPool,
1952    )
1953
1954    with connectable.connect() as connection:
1955        context.configure(connection=connection, target_metadata=target_metadata)
1956
1957        with context.begin_transaction():
1958            context.run_migrations()
1959
1960
1961if context.is_offline_mode():
1962    run_migrations_offline()
1963else:
1964    run_migrations_online()
1965
1966
1967
1968---
1969File: /neurons/miners/src/consumers/validator_consumer.py
1970---
1971
1972import asyncio
1973import logging
1974import time
1975from typing import Annotated
1976
1977import bittensor
1978from datura.consumers.base import BaseConsumer
1979from datura.requests.miner_requests import (
1980    AcceptJobRequest,
1981    AcceptSSHKeyRequest,
1982    DeclineJobRequest,
1983    Executor,
1984    ExecutorSSHInfo,
1985    FailedRequest,
1986    UnAuthorizedRequest,
1987)
1988from datura.requests.validator_requests import (
1989    AuthenticateRequest,
1990    BaseValidatorRequest,
1991    SSHPubKeyRemoveRequest,
1992    SSHPubKeySubmitRequest,
1993)
1994from fastapi import Depends, WebSocket
1995
1996from core.config import settings
1997from services.executor_service import ExecutorService
1998from services.ssh_service import MinerSSHService
1999from services.validator_service import ValidatorService
2000
2001AUTH_MESSAGE_MAX_AGE = 10
2002MAX_MESSAGE_COUNT = 10
2003
2004logger = logging.getLogger(__name__)
2005
2006
2007class ValidatorConsumer(BaseConsumer):
2008    def __init__(
2009        self,
2010        websocket: WebSocket,
2011        validator_key: str,
2012        ssh_service: Annotated[MinerSSHService, Depends(MinerSSHService)],
2013        validator_service: Annotated[ValidatorService, Depends(ValidatorService)],
2014        executor_service: Annotated[ExecutorService, Depends(ExecutorService)],
2015    ):
2016        super().__init__(websocket)
2017        self.ssh_service = ssh_service
2018        self.validator_service = validator_service
2019        self.executor_service = executor_service
2020        self.validator_key = validator_key
2021        self.my_hotkey = settings.get_bittensor_wallet().get_hotkey().ss58_address
2022        self.validator_authenticated = False
2023        self.msg_queue = []
2024
2025    def accepted_request_type(self):
2026        return BaseValidatorRequest
2027
2028    def verify_auth_msg(self, msg: AuthenticateRequest) -> tuple[bool, str]:
2029        if msg.payload.timestamp < time.time() - AUTH_MESSAGE_MAX_AGE:
2030            return False, "msg too old"
2031        if msg.payload.miner_hotkey != self.my_hotkey:
2032            return False, f"wrong miner hotkey ({self.my_hotkey}!={msg.payload.miner_hotkey})"
2033        if msg.payload.validator_hotkey != self.validator_key:
2034            return (
2035                False,
2036                f"wrong validator hotkey ({self.validator_key}!={msg.payload.validator_hotkey})",
2037            )
2038
2039        keypair = bittensor.Keypair(ss58_address=self.validator_key)
2040        if keypair.verify(msg.blob_for_signing(), msg.signature):
2041            return True, ""
2042
2043    async def handle_authentication(self, msg: AuthenticateRequest):
2044        # check if validator is registered
2045        if not self.validator_service.is_valid_validator(self.validator_key):
2046            await self.send_message(UnAuthorizedRequest(details="Validator is not registered"))
2047            await self.disconnect()
2048            return
2049
2050        authenticated, error_msg = self.verify_auth_msg(msg)
2051        if not authenticated:
2052            response_msg = f"Validator {self.validator_key} not authenticated due to: {error_msg}"
2053            logger.info(response_msg)
2054            await self.send_message(UnAuthorizedRequest(details=response_msg))
2055            await self.disconnect()
2056            return
2057
2058        self.validator_authenticated = True
2059        for msg in self.msg_queue:
2060            await self.handle_message(msg)
2061
2062    async def check_validator_allowance(self):
2063        """Check if there's any executors opened for current validator.
2064
2065        If there are any executors, send accept job request to validator w/ executors list
2066        available for that validator.
2067
2068        If no executors, decline job request
2069        """
2070        executors = self.executor_service.get_executors_for_validator(self.validator_key)
2071        if len(executors):
2072            logger.info("Found %d executors for validator(%s)", len(executors), self.validator_key)
2073            await self.send_message(
2074                AcceptJobRequest(
2075                    executors=[
2076                        Executor(uuid=str(executor.uuid), address=executor.address, port=executor.port)
2077                        for executor in executors
2078                    ]
2079                )
2080            )
2081        else:
2082            logger.info("Not found any executors for validator(%s)", self.validator_key)
2083            await self.send_message(DeclineJobRequest())
2084            await self.disconnect()
2085
2086    async def handle_message(self, msg: BaseValidatorRequest):
2087        if isinstance(msg, AuthenticateRequest):
2088            await self.handle_authentication(msg)
2089            if self.validator_authenticated:
2090                await self.check_validator_allowance()
2091            return
2092
2093        # TODO: update logic here, fow now, it sends AcceptJobRequest regardless
2094        # if self.validator_authenticated:
2095        #     await self.send_message(AcceptJobRequest())
2096
2097        if not self.validator_authenticated:
2098            if len(self.msg_queue) <= MAX_MESSAGE_COUNT:
2099                self.msg_queue.append(msg)
2100            return
2101
2102        if isinstance(msg, SSHPubKeySubmitRequest):
2103            logger.info("Validator %s sent SSH Pubkey.", self.validator_key)
2104
2105            try:
2106                msg: SSHPubKeySubmitRequest
2107                executors: list[ExecutorSSHInfo] = await self.executor_service.register_pubkey(
2108                    self.validator_key, msg.public_key, msg.executor_id
2109                )
2110                await self.send_message(AcceptSSHKeyRequest(executors=executors))
2111                logger.info("Sent AcceptSSHKeyRequest to validator %s", self.validator_key)
2112            except Exception as e:
2113                logger.error("Storing SSH key or Sending AcceptSSHKeyRequest failed: %s", str(e))
2114                self.ssh_service.remove_pubkey_from_host(msg.public_key)
2115                await self.send_message(FailedRequest(details=str(e)))
2116            return
2117
2118        if isinstance(msg, SSHPubKeyRemoveRequest):
2119            logger.info("Validator %s sent remove SSH Pubkey.", self.validator_key)
2120            try:
2121                await self.executor_service.deregister_pubkey(self.validator_key, msg.public_key, msg.executor_id)
2122                logger.info("Sent SSHKeyRemoved to validator %s", self.validator_key)
2123            except Exception as e:
2124                logger.error("Failed SSHKeyRemoved request: %s", str(e))
2125                await self.send_message(FailedRequest(details=str(e)))
2126            return
2127
2128
2129class ValidatorConsumerManger:
2130    def __init__(
2131        self,
2132    ):
2133        self.active_consumer: ValidatorConsumer | None = None
2134        self.lock = asyncio.Lock()
2135
2136    async def addConsumer(
2137        self,
2138        websocket: WebSocket,
2139        validator_key: str,
2140        ssh_service: Annotated[MinerSSHService, Depends(MinerSSHService)],
2141        validator_service: Annotated[ValidatorService, Depends(ValidatorService)],
2142        executor_service: Annotated[ExecutorService, Depends(ExecutorService)],
2143    ):
2144        consumer = ValidatorConsumer(
2145            websocket=websocket,
2146            validator_key=validator_key,
2147            ssh_service=ssh_service,
2148            validator_service=validator_service,
2149            executor_service=executor_service,
2150        )
2151        await consumer.connect()
2152
2153        if self.active_consumer is not None:
2154            await consumer.send_message(DeclineJobRequest())
2155            await consumer.disconnect()
2156            return
2157
2158        async with self.lock:
2159            self.active_consumer = consumer
2160
2161            await self.active_consumer.handle()
2162
2163            self.active_consumer = None
2164
2165
2166validatorConsumerManager = ValidatorConsumerManger()
2167
2168
2169
2170---
2171File: /neurons/miners/src/core/__init__.py
2172---
2173
2174
2175
2176
2177---
2178File: /neurons/miners/src/core/config.py
2179---
2180
2181from typing import TYPE_CHECKING
2182import argparse
2183import pathlib
2184
2185import bittensor
2186from pydantic import Field
2187from pydantic_settings import BaseSettings, SettingsConfigDict
2188
2189if TYPE_CHECKING:
2190    from bittensor_wallet import Wallet
2191
2192
2193class Settings(BaseSettings):
2194    model_config = SettingsConfigDict(env_file=".env", extra="ignore")
2195    PROJECT_NAME: str = "compute-subnet-miner"
2196
2197    BITTENSOR_WALLET_DIRECTORY: pathlib.Path = Field(
2198        env="BITTENSOR_WALLET_DIRECTORY",
2199        default=pathlib.Path("~").expanduser() / ".bittensor" / "wallets",
2200    )
2201    BITTENSOR_WALLET_NAME: str = Field(env="BITTENSOR_WALLET_NAME")
2202    BITTENSOR_WALLET_HOTKEY_NAME: str = Field(env="BITTENSOR_WALLET_HOTKEY_NAME")
2203    BITTENSOR_NETUID: int = Field(env="BITTENSOR_NETUID")
2204    BITTENSOR_CHAIN_ENDPOINT: str | None = Field(env="BITTENSOR_CHAIN_ENDPOINT", default=None)
2205    BITTENSOR_NETWORK: str = Field(env="BITTENSOR_NETWORK")
2206
2207    SQLALCHEMY_DATABASE_URI: str = Field(env="SQLALCHEMY_DATABASE_URI")
2208
2209    EXTERNAL_IP_ADDRESS: str = Field(env="EXTERNAL_IP_ADDRESS")
2210    INTERNAL_PORT: int = Field(env="INTERNAL_PORT", default=8000)
2211    EXTERNAL_PORT: int = Field(env="EXTERNAL_PORT", default=8000)
2212    ENV: str = Field(env="ENV", default="dev")
2213    DEBUG: bool = Field(env="DEBUG", default=False)
2214
2215    def get_bittensor_wallet(self) -> "Wallet":
2216        if not self.BITTENSOR_WALLET_NAME or not self.BITTENSOR_WALLET_HOTKEY_NAME:
2217            raise RuntimeError("Wallet not configured")
2218        wallet = bittensor.wallet(
2219            name=self.BITTENSOR_WALLET_NAME,
2220            hotkey=self.BITTENSOR_WALLET_HOTKEY_NAME,
2221            path=str(self.BITTENSOR_WALLET_DIRECTORY),
2222        )
2223        wallet.hotkey_file.get_keypair()  # this raises errors if the keys are inaccessible
2224        return wallet
2225
2226    def get_bittensor_config(self) -> bittensor.config:
2227        parser = argparse.ArgumentParser()
2228        # bittensor.wallet.add_args(parser)
2229        # bittensor.subtensor.add_args(parser)
2230        # bittensor.axon.add_args(parser)
2231
2232        if self.BITTENSOR_NETWORK:
2233            if "--subtensor.network" in parser._option_string_actions:
2234                parser._handle_conflict_resolve(
2235                    None,
2236                    [("--subtensor.network", parser._option_string_actions["--subtensor.network"])],
2237                )
2238
2239            parser.add_argument(
2240                "--subtensor.network",
2241                type=str,
2242                help="network",
2243                default=self.BITTENSOR_NETWORK,
2244            )
2245
2246        if self.BITTENSOR_CHAIN_ENDPOINT:
2247            if "--subtensor.chain_endpoint" in parser._option_string_actions:
2248                parser._handle_conflict_resolve(
2249                    None,
2250                    [
2251                        (
2252                            "--subtensor.chain_endpoint",
2253                            parser._option_string_actions["--subtensor.chain_endpoint"],
2254                        )
2255                    ],
2256                )
2257
2258            parser.add_argument(
2259                "--subtensor.chain_endpoint",
2260                type=str,
2261                help="chain endpoint",
2262                default=self.BITTENSOR_CHAIN_ENDPOINT,
2263            )
2264
2265        return bittensor.config(parser)
2266
2267
2268settings = Settings()
2269
2270
2271
2272---
2273File: /neurons/miners/src/core/db.py
2274---
2275
2276from collections.abc import Generator
2277from typing import Annotated
2278
2279from fastapi import Depends
2280from sqlmodel import Session, create_engine
2281
2282from core.config import settings
2283
2284engine = create_engine(str(settings.SQLALCHEMY_DATABASE_URI))
2285
2286
2287def get_db() -> Generator[Session, None, None]:
2288    with Session(engine) as session:
2289        yield session
2290
2291
2292SessionDep = Annotated[Session, Depends(get_db)]
2293
2294
2295
2296---
2297File: /neurons/miners/src/core/miner.py
2298---
2299
2300from typing import TYPE_CHECKING
2301import logging
2302import traceback
2303import asyncio
2304import bittensor
2305from websockets.protocol import State as WebSocketClientState
2306
2307from core.config import settings
2308from core.db import get_db
2309from core.utils import _m, get_extra_info
2310from daos.validator import ValidatorDao, Validator
2311
2312if TYPE_CHECKING:
2313    from bittensor_wallet import Wallet
2314
2315logger = logging.getLogger(__name__)
2316
2317MIN_STAKE = 10
2318VALIDATORS_LIMIT = 24
2319SYNC_CYCLE = 2 * 60
2320
2321
2322class Miner:
2323    wallet: "Wallet"
2324    subtensor: bittensor.subtensor
2325    netuid: int
2326
2327    def __init__(self):
2328        self.config = settings.get_bittensor_config()
2329        self.wallet = settings.get_bittensor_wallet()
2330        self.netuid = settings.BITTENSOR_NETUID
2331
2332        self.default_extra = {
2333            "external_port": settings.EXTERNAL_PORT,
2334            "external_ip": settings.EXTERNAL_IP_ADDRESS,
2335        }
2336
2337        self.axon = bittensor.axon(
2338            wallet=self.wallet,
2339            external_port=settings.EXTERNAL_PORT,
2340            external_ip=settings.EXTERNAL_IP_ADDRESS,
2341            port=settings.INTERNAL_PORT,
2342            ip=settings.EXTERNAL_IP_ADDRESS,
2343        )
2344        self.subtensor = None
2345        self.set_subtensor()
2346
2347        self.should_exit = False
2348        self.session = next(get_db())
2349        self.validator_dao = ValidatorDao(session=self.session)
2350        self.last_announced_block = 0
2351
2352    def set_subtensor(self):
2353        try:
2354            if (
2355                self.subtensor
2356                and self.subtensor.substrate
2357                and self.subtensor.substrate.websocket
2358                and self.subtensor.substrate.websocket.state is WebSocketClientState.OPEN
2359            ):
2360                return
2361
2362            logger.info(
2363                _m(
2364                    "Getting subtensor",
2365                    extra=get_extra_info(self.default_extra),
2366                ),
2367            )
2368
2369            self.subtensor = bittensor.subtensor(config=self.config)
2370
2371            # check registered
2372            self.check_registered()
2373        except Exception as e:
2374            logger.info(
2375                _m(
2376                    "[Error] Getting subtensor",
2377                    extra=get_extra_info({
2378                        ** self.default_extra,
2379                        "error": str(e),
2380                    }),
2381                ),
2382            )
2383
2384    def check_registered(self):
2385        try:
2386            logger.info(
2387                _m(
2388                    '[check_registered] checking miner is registered',
2389                    extra=get_extra_info(self.default_extra),
2390                ),
2391            )
2392
2393            if not self.subtensor.is_hotkey_registered(
2394                netuid=self.netuid,
2395                hotkey_ss58=self.wallet.get_hotkey().ss58_address,
2396            ):
2397                logger.error(
2398                    _m(
2399                        f"[check_registered] Wallet: {self.wallet} is not registered on netuid {self.netuid}.",
2400                        extra=get_extra_info(self.default_extra),
2401                    ),
2402                )
2403                exit()
2404        except Exception as e:
2405            logger.error(
2406                _m(
2407                    '[check_registered] Checking miner registered failed',
2408                    extra=get_extra_info({
2409                        **self.default_extra,
2410                        "error": str(e)
2411                    }),
2412                ),
2413            )
2414
2415    def get_node(self):
2416        # return SubstrateInterface(url=self.config.subtensor.chain_endpoint)
2417        return self.subtensor.substrate
2418
2419    def get_current_block(self):
2420        node = self.get_node()
2421        return node.query("System", "Number", []).value
2422
2423    def get_tempo(self):
2424        return self.subtensor.tempo(self.netuid)
2425
2426    def get_serving_rate_limit(self):
2427        node = self.get_node()
2428        return node.query("SubtensorModule", "ServingRateLimit", [self.netuid]).value
2429
2430    def announce(self):
2431        try:
2432            current_block = self.get_current_block()
2433            tempo = self.get_tempo()
2434
2435            if current_block - self.last_announced_block >= tempo:
2436                self.last_announced_block = current_block
2437
2438                logger.info(
2439                    _m(
2440                        '[announce] Announce miner',
2441                        extra=get_extra_info(self.default_extra),
2442                    ),
2443                )
2444                self.axon.serve(netuid=self.netuid, subtensor=self.subtensor)
2445        except Exception as e:
2446            logger.error(
2447                _m(
2448                    '[announce] Announcing miner error',
2449                    extra=get_extra_info({
2450                        **self.default_extra,
2451                        "error": str(e)
2452                    }),
2453                ),
2454            )
2455
2456    async def fetch_validators(self):
2457        metagraph = self.subtensor.metagraph(netuid=self.netuid)
2458        neurons = [n for n in metagraph.neurons if (n.stake.tao >= MIN_STAKE)]
2459        return neurons[:VALIDATORS_LIMIT]
2460
2461    async def save_validators(self, validators):
2462        logger.info(
2463            _m(
2464                '[save_validators] Sync validators',
2465                extra=get_extra_info(self.default_extra),
2466            ),
2467        )
2468        for v in validators:
2469            existing = self.validator_dao.get_validator_by_hotkey(v.hotkey)
2470            if not existing:
2471                self.validator_dao.save(
2472                    Validator(
2473                        validator_hotkey=v.hotkey,
2474                        active=True
2475                    )
2476                )
2477
2478    async def sync(self):
2479        try:
2480            self.set_subtensor()
2481
2482            self.announce()
2483
2484            validators = await self.fetch_validators()
2485            await self.save_validators(validators)
2486        except Exception as e:
2487            logger.error(
2488                _m(
2489                    '[sync] Miner sync failed',
2490                    extra=get_extra_info({
2491                        **self.default_extra,
2492                        "error": str(e)
2493                    }),
2494                ),
2495            )
2496
2497    async def start(self):
2498        logger.info(
2499            _m(
2500                'Start Miner in background',
2501                extra=get_extra_info(self.default_extra),
2502            ),
2503        )
2504        try:
2505            while not self.should_exit:
2506                await self.sync()
2507
2508                # sync every 2 mins
2509                await asyncio.sleep(SYNC_CYCLE)
2510        except KeyboardInterrupt:
2511            logger.debug('Miner killed by keyboard interrupt.')
2512            exit()
2513        except Exception as e:
2514            logger.error(traceback.format_exc())
2515
2516    async def stop(self):
2517        logger.info(
2518            _m(
2519                'Stop Miner process',
2520                extra=get_extra_info(self.default_extra),
2521            ),
2522        )
2523        self.should_exit = True
2524
2525
2526
2527---
2528File: /neurons/miners/src/core/utils.py
2529---
2530
2531import asyncio
2532import contextvars
2533import json
2534import logging
2535
2536from core.config import settings
2537
2538logger = logging.getLogger(__name__)
2539
2540# Create a ContextVar to hold the context information
2541context = contextvars.ContextVar("context", default="ValidatorService")
2542context.set("ValidatorService")
2543
2544
2545def wait_for_services_sync(timeout=30):
2546    """Wait until PostgreSQL connections are working."""
2547    from sqlalchemy import create_engine, text
2548
2549    from core.config import settings
2550
2551    logger.info("Waiting for services to be available...")
2552
2553    while True:
2554        try:
2555            # Check PostgreSQL connection using SQLAlchemy
2556            engine = create_engine(settings.SQLALCHEMY_DATABASE_URI)
2557            with engine.connect() as connection:
2558                connection.execute(text("SELECT 1"))
2559            logger.info("Connected to PostgreSQL.")
2560
2561            break
2562        except Exception as e:
2563            logger.error("Failed to connect to PostgreSQL.")
2564            raise e
2565
2566
2567def get_extra_info(extra: dict) -> dict:
2568    task = asyncio.current_task()
2569    coro_name = task.get_coro().__name__ if task else "NoTask"
2570    task_id = id(task) if task else "NoTaskID"
2571    extra_info = {
2572        "coro_name": coro_name,
2573        "task_id": task_id,
2574        **extra,
2575    }
2576    return extra_info
2577
2578
2579def configure_logs_of_other_modules():
2580    miner_hotkey = settings.get_bittensor_wallet().get_hotkey().ss58_address
2581
2582    logging.basicConfig(
2583        level=logging.INFO,
2584        format=f"Miner: {miner_hotkey} | Name: %(name)s | Time: %(asctime)s | Level: %(levelname)s | File: %(filename)s | Function: %(funcName)s | Line: %(lineno)s | Process: %(process)d | Message: %(message)s",
2585    )
2586
2587    sqlalchemy_logger = logging.getLogger("sqlalchemy")
2588    sqlalchemy_logger.setLevel(logging.WARNING)
2589
2590    # Create a custom formatter that adds the context to the log messages
2591    class CustomFormatter(logging.Formatter):
2592        def format(self, record):
2593            try:
2594                task = asyncio.current_task()
2595                coro_name = task.get_coro().__name__ if task else "NoTask"
2596                task_id = id(task) if task else "NoTaskID"
2597                return f"{getattr(record, 'context', 'Default')} | {coro_name} | {task_id} | {super().format(record)}"
2598            except Exception:
2599                return ""
2600
2601    # Create a handler for the logger
2602    handler = logging.StreamHandler()
2603
2604    # Set the formatter for the handler
2605    handler.setFormatter(
2606        CustomFormatter("%(name)s %(asctime)s %(levelname)s %(filename)s %(process)d %(message)s")
2607    )
2608
2609
2610class StructuredMessage:
2611    def __init__(self, message, extra: dict):
2612        self.message = message
2613        self.extra = extra
2614
2615    def __str__(self):
2616        return "%s >>> %s" % (self.message, json.dumps(self.extra))  # noqa
2617
2618
2619_m = StructuredMessage
2620
2621
2622
2623---
2624File: /neurons/miners/src/daos/__init__.py
2625---
2626
2627
2628
2629
2630---
2631File: /neurons/miners/src/daos/base.py
2632---
2633
2634from core.db import SessionDep
2635
2636
2637class BaseDao:
2638    def __init__(self, session: SessionDep):
2639        self.session = session
2640
2641
2642
2643---
2644File: /neurons/miners/src/daos/executor.py
2645---
2646
2647from typing import Optional
2648from daos.base import BaseDao
2649from models.executor import Executor
2650
2651
2652class ExecutorDao(BaseDao):
2653    def save(self, executor: Executor) -> Executor:
2654        self.session.add(executor)
2655        self.session.commit()
2656        self.session.refresh(executor)
2657        return executor
2658
2659    def delete_by_address_port(self, address: str, port: int) -> None:
2660        executor = self.session.query(Executor).filter_by(
2661            address=address, port=port).first()
2662        if executor:
2663            self.session.delete(executor)
2664            self.session.commit()
2665
2666    def get_executors_for_validator(self, validator_key: str, executor_id: Optional[str] = None) -> list[Executor]:
2667        """Get executors that opened to valdiator
2668
2669        Args:
2670            validator_key (str): validator hotkey string
2671
2672        Return:
2673            List[Executor]: list of Executors
2674        """
2675        if executor_id:
2676            return list(self.session.query(Executor).filter_by(validator=validator_key, uuid=executor_id))
2677
2678        return list(self.session.query(Executor).filter_by(validator=validator_key))
2679
2680    def get_all_executors(self) -> list[Executor]:
2681        return list(self.session.query(Executor).all())
2682
2683
2684
2685---
2686File: /neurons/miners/src/daos/validator.py
2687---
2688
2689from daos.base import BaseDao
2690
2691from models.validator import Validator
2692
2693
2694class ValidatorDao(BaseDao):
2695    def save(self, validator: Validator) -> Validator:
2696        self.session.add(validator)
2697        self.session.commit()
2698        self.session.refresh(validator)
2699        return validator
2700        
2701    def get_validator_by_hotkey(self, hotkey: str):
2702        return self.session.query(Validator).filter_by(validator_hotkey=hotkey).first()
2703
2704
2705
2706---
2707File: /neurons/miners/src/models/__init__.py
2708---
2709
2710
2711
2712
2713---
2714File: /neurons/miners/src/models/executor.py
2715---
2716
2717import uuid
2718from uuid import UUID
2719
2720from sqlmodel import Field, SQLModel, UniqueConstraint
2721
2722
2723class Executor(SQLModel, table=True):
2724    """Task model."""
2725
2726    __table_args__ = (UniqueConstraint("address", "port", name="unique_contraint_address_port"),)
2727
2728    uuid: UUID | None = Field(default_factory=uuid.uuid4, primary_key=True)
2729    address: str
2730    port: int
2731    validator: str
2732
2733    def __str__(self):
2734        return f"{self.address}:{self.port}"
2735
2736
2737
2738---
2739File: /neurons/miners/src/models/validator.py
2740---
2741
2742import uuid
2743from uuid import UUID
2744
2745from sqlmodel import Field, SQLModel
2746
2747
2748class Validator(SQLModel, table=True):
2749    """Task model."""
2750
2751    uuid: UUID | None = Field(default_factory=uuid.uuid4, primary_key=True)
2752    validator_hotkey: str = Field(unique=True)
2753    active: bool
2754
2755
2756
2757---
2758File: /neurons/miners/src/routes/__init__.py
2759---
2760
2761
2762
2763
2764---
2765File: /neurons/miners/src/routes/debug_routes.py
2766---
2767
2768from typing import Annotated
2769
2770from fastapi import APIRouter, Depends
2771
2772from core.config import settings
2773from services.executor_service import ExecutorService
2774
2775debug_apis_router = APIRouter()
2776
2777
2778@debug_apis_router.get("/debug/get-executors-for-validator/{validator_hotkey}")
2779async def get_executors_for_validator(
2780    validator_hotkey: str, executor_service: Annotated[ExecutorService, Depends(ExecutorService)]
2781):
2782    if not settings.DEBUG:
2783        return None
2784    return executor_service.get_executors_for_validator(validator_hotkey)
2785
2786
2787@debug_apis_router.post("/debug/register_pubkey/{validator_hotkey}")
2788async def register_pubkey(
2789    validator_hotkey: str, executor_service: Annotated[ExecutorService, Depends(ExecutorService)]
2790):
2791    if not settings.DEBUG:
2792        return None
2793    pub_key = "Test Pubkey"
2794    return await executor_service.register_pubkey(validator_hotkey, pub_key.encode("utf-8"))
2795
2796
2797@debug_apis_router.post("/debug/remove_pubkey/{validator_hotkey}")
2798async def remove_pubkey_from_executor(
2799    validator_hotkey: str, executor_service: Annotated[ExecutorService, Depends(ExecutorService)]
2800):
2801    if not settings.DEBUG:
2802        return None
2803    pub_key = "Test Pubkey"
2804    await executor_service.deregister_pubkey(validator_hotkey, pub_key.encode("utf-8"))
2805
2806
2807
2808---
2809File: /neurons/miners/src/routes/validator_interface.py
2810---
2811
2812from typing import Annotated
2813
2814from fastapi import APIRouter, Depends, WebSocket
2815
2816from consumers.validator_consumer import ValidatorConsumer
2817validator_router = APIRouter()
2818
2819
2820@validator_router.websocket("/jobs/{validator_key}")
2821async def validator_interface(consumer: Annotated[ValidatorConsumer, Depends(ValidatorConsumer)]):
2822    await consumer.connect()
2823    await consumer.handle()
2824
2825
2826@validator_router.websocket("/resources/{validator_key}")
2827async def validator_interface(consumer: Annotated[ValidatorConsumer, Depends(ValidatorConsumer)]):
2828    await consumer.connect()
2829    await consumer.handle()
2830
2831
2832
2833---
2834File: /neurons/miners/src/services/executor_service.py
2835---
2836
2837import asyncio
2838import json
2839import logging
2840from typing import Annotated, Optional
2841
2842import aiohttp
2843import bittensor
2844from datura.requests.miner_requests import ExecutorSSHInfo
2845from fastapi import Depends
2846
2847from core.config import settings
2848from daos.executor import ExecutorDao
2849from models.executor import Executor
2850
2851logging.basicConfig(level=logging.INFO)
2852logger = logging.getLogger(__name__)
2853
2854
2855class ExecutorService:
2856    def __init__(self, executor_dao: Annotated[ExecutorDao, Depends(ExecutorDao)]):
2857        self.executor_dao = executor_dao
2858
2859    def get_executors_for_validator(self, validator_hotkey: str, executor_id: Optional[str] = None):
2860        return self.executor_dao.get_executors_for_validator(validator_hotkey, executor_id)
2861
2862    async def send_pubkey_to_executor(
2863        self, executor: Executor, pubkey: str
2864    ) -> ExecutorSSHInfo | None:
2865        """TODO: Send API request to executor with pubkey
2866
2867        Args:
2868            executor (Executor): Executor instance that register validator hotkey
2869            pubkey (str): SSH public key from validator
2870
2871        Return:
2872            response (ExecutorSSHInfo | None): Executor SSH connection info.
2873        """
2874        timeout = aiohttp.ClientTimeout(total=10)  # 5 seconds timeout
2875        url = f"http://{executor.address}:{executor.port}/upload_ssh_key"
2876        keypair: bittensor.Keypair = settings.get_bittensor_wallet().get_hotkey()
2877        payload = {"public_key": pubkey, "signature": f"0x{keypair.sign(pubkey).hex()}"}
2878        async with aiohttp.ClientSession(timeout=timeout) as session:
2879            try:
2880                async with session.post(url, json=payload) as response:
2881                    if response.status != 200:
2882                        logger.error("API request failed to register SSH key. url=%s", url)
2883                        return None
2884                    response_obj: dict = await response.json()
2885                    logger.info(
2886                        "Get response from Executor(%s:%s): %s",
2887                        executor.address,
2888                        executor.port,
2889                        json.dumps(response_obj),
2890                    )
2891                    response_obj["uuid"] = str(executor.uuid)
2892                    response_obj["address"] = executor.address
2893                    response_obj["port"] = executor.port
2894                    return ExecutorSSHInfo.parse_obj(response_obj)
2895            except Exception as e:
2896                logger.error(
2897                    "API request failed to register SSH key. url=%s, error=%s", url, str(e)
2898                )
2899
2900    async def remove_pubkey_from_executor(self, executor: Executor, pubkey: str):
2901        """TODO: Send API request to executor to cleanup pubkey
2902
2903        Args:
2904            executor (Executor): Executor instance that needs to remove pubkey
2905        """
2906        timeout = aiohttp.ClientTimeout(total=10)  # 5 seconds timeout
2907        url = f"http://{executor.address}:{executor.port}/remove_ssh_key"
2908        keypair: bittensor.Keypair = settings.get_bittensor_wallet().get_hotkey()
2909        payload = {"public_key": pubkey, "signature": f"0x{keypair.sign(pubkey).hex()}"}
2910        async with aiohttp.ClientSession(timeout=timeout) as session:
2911            try:
2912                async with session.post(url, json=payload) as response:
2913                    if response.status != 200:
2914                        logger.error("API request failed to register SSH key. url=%s", url)
2915                        return None
2916            except Exception as e:
2917                logger.error(
2918                    "API request failed to register SSH key. url=%s, error=%s", url, str(e)
2919                )
2920
2921    async def register_pubkey(self, validator_hotkey: str, pubkey: bytes, executor_id: Optional[str] = None):
2922        """Register pubkeys to executors for given validator.
2923
2924        Args:
2925            validator_hotkey (str): Validator hotkey
2926            pubkey (bytes): SSH pubkey from validator.
2927
2928        Return:
2929            List[dict/object]: Executors SSH connection infos that accepted validator pubkey.
2930        """
2931        tasks = [
2932            asyncio.create_task(
2933                self.send_pubkey_to_executor(executor, pubkey.decode("utf-8")),
2934                name=f"{executor}.send_pubkey_to_executor",
2935            )
2936            for executor in self.get_executors_for_validator(validator_hotkey, executor_id)
2937        ]
2938
2939        total_executors = len(tasks)
2940        results = [
2941            result for result in await asyncio.gather(*tasks, return_exceptions=True) if result
2942        ]
2943        logger.info(
2944            "Send pubkey register API requests to %d executors and received results from %d executors",
2945            total_executors,
2946            len(results),
2947        )
2948        return results
2949
2950    async def deregister_pubkey(self, validator_hotkey: str, pubkey: bytes, executor_id: Optional[str] = None):
2951        """Deregister pubkey from executors.
2952
2953        Args:
2954            validator_hotkey (str): Validator hotkey
2955            pubkey (bytes): validator pubkey
2956        """
2957        tasks = [
2958            asyncio.create_task(
2959                self.remove_pubkey_from_executor(executor, pubkey.decode("utf-8")),
2960                name=f"{executor}.remove_pubkey_from_executor",
2961            )
2962            for executor in self.get_executors_for_validator(validator_hotkey, executor_id)
2963        ]
2964        await asyncio.gather(*tasks, return_exceptions=True)
2965
2966
2967
2968---
2969File: /neurons/miners/src/services/ssh_service.py
2970---
2971
2972import getpass
2973import os
2974
2975
2976class MinerSSHService:
2977    def add_pubkey_to_host(self, pub_key: bytes):
2978        with open(os.path.expanduser("~/.ssh/authorized_keys"), "a") as file:
2979            file.write(pub_key.decode() + "\n")
2980            
2981    def remove_pubkey_from_host(self, pub_key: bytes):
2982        pub_key_str = pub_key.decode().strip()
2983        authorized_keys_path = os.path.expanduser("~/.ssh/authorized_keys")
2984
2985        with open(authorized_keys_path, "r") as file:
2986            lines = file.readlines()
2987
2988        with open(authorized_keys_path, "w") as file:
2989            for line in lines:
2990                if line.strip() != pub_key_str:
2991                    file.write(line)
2992
2993    def get_current_os_user(self) -> str:
2994        return getpass.getuser()
2995
2996
2997
2998---
2999File: /neurons/miners/src/services/validator_service.py
3000---
3001
3002from typing import Annotated
3003
3004from fastapi import Depends
3005
3006from daos.validator import ValidatorDao
3007
3008
3009class ValidatorService:
3010    def __init__(self, validator_dao: Annotated[ValidatorDao, Depends(ValidatorDao)]):
3011        self.validator_dao = validator_dao
3012
3013    def is_valid_validator(self, validator_hotkey: str) -> bool:
3014        return not (not self.validator_dao.get_validator_by_hotkey(validator_hotkey))
3015
3016
3017
3018---
3019File: /neurons/miners/src/_miner.py
3020---
3021
3022# The MIT License (MIT)
3023# Copyright © 2023 Yuma Rao
3024# TODO(developer): Set your name
3025# Copyright © 2023 <your name>
3026
3027# Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated
3028# documentation files (the “Software”), to deal in the Software without restriction, including without limitation
3029# the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software,
3030# and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
3031
3032# The above copyright notice and this permission notice shall be included in all copies or substantial portions of
3033# the Software.
3034
3035# THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO
3036# THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
3037# THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION
3038# OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
3039# DEALINGS IN THE SOFTWARE.
3040
3041import time
3042
3043import bittensor as bt
3044
3045# Bittensor Miner Template:
3046import template
3047
3048# import base miner class which takes care of most of the boilerplate
3049from template.base.miner import BaseMinerNeuron
3050
3051
3052class Miner(BaseMinerNeuron):
3053    """
3054    Your miner neuron class. You should use this class to define your miner's behavior. In particular, you should replace the forward function with your own logic. You may also want to override the blacklist and priority functions according to your needs.
3055
3056    This class inherits from the BaseMinerNeuron class, which in turn inherits from BaseNeuron. The BaseNeuron class takes care of routine tasks such as setting up wallet, subtensor, metagraph, logging directory, parsing config, etc. You can override any of the methods in BaseNeuron if you need to customize the behavior.
3057
3058    This class provides reasonable default behavior for a miner such as blacklisting unrecognized hotkeys, prioritizing requests based on stake, and forwarding requests to the forward function. If you need to define custom
3059    """
3060
3061    def __init__(self, config=None):
3062        super().__init__(config=config)
3063
3064        # TODO(developer): Anything specific to your use case you can do here
3065
3066    async def forward(self, synapse: template.protocol.Dummy) -> template.protocol.Dummy:
3067        """
3068        Processes the incoming 'Dummy' synapse by performing a predefined operation on the input data.
3069        This method should be replaced with actual logic relevant to the miner's purpose.
3070
3071        Args:
3072            synapse (template.protocol.Dummy): The synapse object containing the 'dummy_input' data.
3073
3074        Returns:
3075            template.protocol.Dummy: The synapse object with the 'dummy_output' field set to twice the 'dummy_input' value.
3076
3077        The 'forward' function is a placeholder and should be overridden with logic that is appropriate for
3078        the miner's intended operation. This method demonstrates a basic transformation of input data.
3079        """
3080        # TODO(developer): Replace with actual implementation logic.
3081        synapse.dummy_output = synapse.dummy_input * 2
3082        return synapse
3083
3084    async def blacklist(self, synapse: template.protocol.Dummy) -> tuple[bool, str]:
3085        """
3086        Determines whether an incoming request should be blacklisted and thus ignored. Your implementation should
3087        define the logic for blacklisting requests based on your needs and desired security parameters.
3088
3089        Blacklist runs before the synapse data has been deserialized (i.e. before synapse.data is available).
3090        The synapse is instead contracted via the headers of the request. It is important to blacklist
3091        requests before they are deserialized to avoid wasting resources on requests that will be ignored.
3092
3093        Args:
3094            synapse (template.protocol.Dummy): A synapse object constructed from the headers of the incoming request.
3095
3096        Returns:
3097            Tuple[bool, str]: A tuple containing a boolean indicating whether the synapse's hotkey is blacklisted,
3098                            and a string providing the reason for the decision.
3099
3100        This function is a security measure to prevent resource wastage on undesired requests. It should be enhanced
3101        to include checks against the metagraph for entity registration, validator status, and sufficient stake
3102        before deserialization of synapse data to minimize processing overhead.
3103
3104        Example blacklist logic:
3105        - Reject if the hotkey is not a registered entity within the metagraph.
3106        - Consider blacklisting entities that are not validators or have insufficient stake.
3107
3108        In practice it would be wise to blacklist requests from entities that are not validators, or do not have
3109        enough stake. This can be checked via metagraph.S and metagraph.validator_permit. You can always attain
3110        the uid of the sender via a metagraph.hotkeys.index( synapse.dendrite.hotkey ) call.
3111
3112        Otherwise, allow the request to be processed further.
3113        """
3114
3115        if synapse.dendrite is None or synapse.dendrite.hotkey is None:
3116            bt.logging.warning("Received a request without a dendrite or hotkey.")
3117            return True, "Missing dendrite or hotkey"
3118
3119        # TODO(developer): Define how miners should blacklist requests.
3120        uid = self.metagraph.hotkeys.index(synapse.dendrite.hotkey)
3121        if (
3122            not self.config.blacklist.allow_non_registered
3123            and synapse.dendrite.hotkey not in self.metagraph.hotkeys
3124        ):
3125            # Ignore requests from un-registered entities.
3126            bt.logging.trace(f"Blacklisting un-registered hotkey {synapse.dendrite.hotkey}")
3127            return True, "Unrecognized hotkey"
3128
3129        if self.config.blacklist.force_validator_permit:
3130            # If the config is set to force validator permit, then we should only allow requests from validators.
3131            if not self.metagraph.validator_permit[uid]:
3132                bt.logging.warning(
3133                    f"Blacklisting a request from non-validator hotkey {synapse.dendrite.hotkey}"
3134                )
3135                return True, "Non-validator hotkey"
3136
3137        bt.logging.trace(f"Not Blacklisting recognized hotkey {synapse.dendrite.hotkey}")
3138        return False, "Hotkey recognized!"
3139
3140    async def priority(self, synapse: template.protocol.Dummy) -> float:
3141        """
3142        The priority function determines the order in which requests are handled. More valuable or higher-priority
3143        requests are processed before others. You should design your own priority mechanism with care.
3144
3145        This implementation assigns priority to incoming requests based on the calling entity's stake in the metagraph.
3146
3147        Args:
3148            synapse (template.protocol.Dummy): The synapse object that contains metadata about the incoming request.
3149
3150        Returns:
3151            float: A priority score derived from the stake of the calling entity.
3152
3153        Miners may receive messages from multiple entities at once. This function determines which request should be
3154        processed first. Higher values indicate that the request should be processed first. Lower values indicate
3155        that the request should be processed later.
3156
3157        Example priority logic:
3158        - A higher stake results in a higher priority value.
3159        """
3160        if synapse.dendrite is None or synapse.dendrite.hotkey is None:
3161            bt.logging.warning("Received a request without a dendrite or hotkey.")
3162            return 0.0
3163
3164        # TODO(developer): Define how miners should prioritize requests.
3165        caller_uid = self.metagraph.hotkeys.index(synapse.dendrite.hotkey)  # Get the caller index.
3166        priority = float(self.metagraph.S[caller_uid])  # Return the stake as the priority.
3167        bt.logging.trace(f"Prioritizing {synapse.dendrite.hotkey} with value: {priority}")
3168        return priority
3169
3170
3171# This is the main function, which runs the miner.
3172if __name__ == "__main__":
3173    with Miner() as miner:
3174        while True:
3175            bt.logging.info(f"Miner running... {time.time()}")
3176            time.sleep(5)
3177
3178
3179
3180---
3181File: /neurons/miners/src/cli.py
3182---
3183
3184import asyncio
3185import logging
3186import uuid
3187
3188import click
3189import sqlalchemy
3190
3191from core.db import get_db
3192from daos.executor import ExecutorDao
3193from models.executor import Executor
3194
3195logging.basicConfig(level=logging.INFO)
3196logger = logging.getLogger(__name__)
3197
3198
3199async def async_add_executor(address: str, port: int, validator: str):
3200    """Add executor machine to the database"""
3201    logger.info("Add an new executor (%s:%d) that opens to validator(%s)", address, port, validator)
3202    executor_dao = ExecutorDao(session=next(get_db()))
3203    try:
3204        executor = executor_dao.save(
3205            Executor(uuid=uuid.uuid4(), address=address, port=port, validator=validator)
3206        )
3207    except sqlalchemy.exc.IntegrityError as e:
3208        logger.error("Failed in adding an executor: %s", str(e))
3209    else:
3210        logger.info("Added an executor(id=%s)", str(executor.uuid))
3211
3212
3213@click.group()
3214def cli():
3215    pass
3216
3217
3218@cli.command()
3219@click.option("--address", prompt="IP Address", help="IP address of executor")
3220@click.option("--port", type=int, prompt="Port", help="Port of executor")
3221@click.option(
3222    "--validator", prompt="Validator Hotkey", help="Validator hotkey that executor opens to."
3223)
3224def add_executor(address: str, port: int, validator: str):
3225    """Add executor machine to the database"""
3226    asyncio.run(async_add_executor(address, port, validator))
3227
3228
3229@cli.command()
3230@click.option("--address", prompt="IP Address", help="IP address of executor")
3231@click.option("--port", type=int, prompt="Port", help="Port of executor")
3232def remove_executor(address: str, port: int):
3233    """Add executor machine to the database"""
3234    logger.info("Removing executor (%s:%d)", address, port)
3235    executor_dao = ExecutorDao(session=next(get_db()))
3236    try:
3237        executor_dao.delete_by_address_port(address, port)
3238    except sqlalchemy.exc.IntegrityError as e:
3239        logger.error("Failed in removing an executor: %s", str(e))
3240    else:
3241        logger.info("Removed an executor(%s:%d)", address, port)
3242
3243
3244@cli.command()
3245def show_executors():
3246    """Add executor machine to the database"""
3247    executor_dao = ExecutorDao(session=next(get_db()))
3248    try:
3249        for executor in executor_dao.get_all_executors():
3250            logger.info("%s:%d -> %s", executor.address, executor.port, executor.validator)
3251    except sqlalchemy.exc.IntegrityError as e:
3252        logger.error("Failed in removing an executor: %s", str(e))
3253
3254
3255if __name__ == "__main__":
3256    cli()
3257
3258
3259
3260---
3261File: /neurons/miners/src/gpt2-training-model.py
3262---
3263
3264import time
3265import torch
3266from datasets import load_dataset
3267from torch.utils.data import DataLoader
3268from transformers import AdamW, GPT2LMHeadModel, GPT2Tokenizer
3269
3270# Load a small dataset
3271dataset = load_dataset("wikitext", "wikitext-2-raw-v1", split="train[:1000]")
3272
3273# Initialize tokenizer and model
3274tokenizer = GPT2Tokenizer.from_pretrained("gpt2")
3275model = GPT2LMHeadModel.from_pretrained("gpt2")
3276
3277tokenizer.pad_token = tokenizer.eos_token
3278
3279
3280# Tokenize the dataset
3281def tokenize_function(examples):
3282    return tokenizer(examples["text"], truncation=True, max_length=128, padding="max_length")
3283
3284
3285start_time = time.time()
3286tokenized_dataset = dataset.map(tokenize_function, batched=True)
3287tokenized_dataset = tokenized_dataset.remove_columns(["text"])
3288tokenized_dataset.set_format("torch")
3289
3290# Create DataLoader
3291dataloader = DataLoader(tokenized_dataset, batch_size=4, shuffle=True)
3292
3293# Training loop
3294device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
3295print("device", device)
3296model.to(device)
3297
3298
3299# Evaluation function
3300def evaluate(model, dataloader):
3301    model.eval()
3302    total_loss = 0
3303    with torch.no_grad():
3304        for batch in dataloader:
3305            inputs = batch["input_ids"].to(device)
3306            outputs = model(input_ids=inputs, labels=inputs)
3307            total_loss += outputs.loss.item()
3308    return total_loss / len(dataloader)
3309
3310
3311# Initial evaluation
3312initial_loss = evaluate(model, dataloader)
3313print(f"Initial Loss: {initial_loss:.4f}")
3314print(f"Initial Perplexity: {torch.exp(torch.tensor(initial_loss)):.4f}")
3315optimizer = AdamW(model.parameters(), lr=5e-5, no_deprecation_warning=True)
3316
3317num_epochs = 1
3318for epoch in range(num_epochs):
3319    model.train()
3320    for batch in dataloader:
3321        batch = {k: v.to(device) for k, v in batch.items()}
3322        outputs = model(input_ids=batch["input_ids"], labels=batch["input_ids"])
3323        loss = outputs.loss
3324        loss.backward()
3325        optimizer.step()
3326        optimizer.zero_grad()
3327    print(f"Epoch {epoch+1}/{num_epochs} completed")
3328
3329# Final evaluation
3330final_loss = evaluate(model, dataloader)
3331print(f"Final Loss: {final_loss:.4f}")
3332print(f"Final Perplexity: {torch.exp(torch.tensor(final_loss)):.4f}")
3333
3334print(f"Loss decreased by: {initial_loss - final_loss:.4f}")
3335print(
3336    f"Perplexity decreased by: {torch.exp(torch.tensor(initial_loss)) - torch.exp(torch.tensor(final_loss)):.4f}"
3337)
3338
3339print("Job finished")
3340print(time.time() - start_time)
3341
3342
3343---
3344File: /neurons/miners/src/miner.py
3345---
3346
3347import asyncio
3348import logging
3349from contextlib import asynccontextmanager
3350
3351import uvicorn
3352from fastapi import FastAPI
3353
3354from core.config import settings
3355from core.miner import Miner
3356from routes.debug_routes import debug_apis_router
3357from routes.validator_interface import validator_router
3358from core.utils import configure_logs_of_other_modules, wait_for_services_sync
3359
3360configure_logs_of_other_modules()
3361wait_for_services_sync()
3362
3363
3364@asynccontextmanager
3365async def app_lifespan(app: FastAPI):
3366    miner = Miner()
3367    # Run the miner in the background
3368    task = asyncio.create_task(miner.start())
3369
3370    try:
3371        yield
3372    finally:
3373        await miner.stop()  # Ensure proper cleanup
3374        await task  # Wait for the background task to complete
3375        logging.info("Miner exited successfully.")
3376
3377
3378app = FastAPI(
3379    title=settings.PROJECT_NAME,
3380    lifespan=app_lifespan,
3381)
3382
3383app.include_router(validator_router)
3384app.include_router(debug_apis_router)
3385
3386reload = True if settings.ENV == "dev" else False
3387
3388if __name__ == "__main__":
3389    uvicorn.run("miner:app", host="0.0.0.0", port=settings.INTERNAL_PORT, reload=reload)
3390
3391
3392
3393---
3394File: /neurons/miners/tests/__init__.py
3395---
3396
3397
3398
3399
3400---
3401File: /neurons/miners/assigning_validator_hotkeys.md
3402---
3403
3404# Best Practices for Assigning Validator Hotkeys
3405
3406In the Compute Subnet, validators play a critical role in ensuring the performance and security of the network. However, miners must assign executors carefully to the validators to maximize incentives. This guide explains the best strategy for assigning validator hotkeys based on stake distribution within the network.
3407
3408## Why Validator Hotkey Assignment Matters
3409
3410You will **not receive any rewards** if your executors are not assigned to validators that control a **majority of the stake** in the network. Therefore, it’s crucial to understand how stake distribution works and how to assign your executors effectively.
3411
3412## Step-by-Step Strategy for Assigning Validator Hotkeys
3413
3414### 1. Check the Validator Stakes
3415
3416The first step is to determine how much stake each validator controls in the network. You can find the current stake distribution of all validators by visiting:
3417
3418[**TaoMarketCap Subnet 51 Validators**](https://taomarketcap.com/subnets/51/validators)
3419
3420This page lists each validator and their respective stake, which is essential for making decisions about hotkey assignments.
3421
3422### 2. Assign Executors to Cover at Least 50% of the Stake
3423
3424To begin, you need to ensure that your executors are covering **at least 50%** of the total network stake. This guarantees that your executors will be actively validated and you’ll receive rewards.
3425
3426#### Example:
3427
3428Suppose you have **100 executors** (GPUs) and the stake distribution of the validators is as follows:
3429
3430| Validator | Stake (%) |
3431|-----------|-----------|
3432| Validator 1 | 50% |
3433| Validator 2 | 25% |
3434| Validator 3 | 15% |
3435| Validator 4 | 5% |
3436| Validator 5 | 1% |
3437
3438- To cover 50% of the total stake, assign **enough executors** to cover **Validator 1** (50% stake).
3439- In this case, assign at least **one executor** to **Validator 1** because they control 50% of the network stake.
3440
3441### 3. Stake-Weighted Assignment for Remaining Executors
3442
3443Once you’ve ensured that you’re covering at least 50% of the network stake, the remaining executors should be assigned in a **stake-weighted** fashion to maximize rewards.
3444
3445#### Continuing the Example:
3446
3447You have **99 remaining executors** to assign to validators. Here's the distribution of executors you should follow based on the stake:
3448
3449- **Validator 1 (50% stake)**: Assign **50% of executors** to Validator 1.
3450    - Assign 50 executors.
3451- **Validator 2 (25% stake)**: Assign **25% of executors** to Validator 2.
3452    - Assign 25 executors.
3453- **Validator 3 (15% stake)**: Assign **15% of executors** to Validator 3.
3454    - Assign 15 executors.
3455- **Validator 4 (5% stake)**: Assign **5% of executors** to Validator 4.
3456    - Assign 5 executors.
3457- **Validator 5 (1% stake)**: Assign **1% of executors** to Validator 5.
3458    - Assign 1 executor.
3459
3460### 4. Adjust Based on Network Dynamics
3461
3462The stake of validators can change over time. Make sure to periodically check the **validator stakes** on [TaoMarketCap](https://taomarketcap.com/subnets/51/validators) and **reassign your executors** as needed to maintain optimal rewards. If a validator’s stake increases significantly, you may want to adjust your assignments accordingly.
3463
3464## Summary of the Best Strategy
3465
3466- **Step 1**: Check the validator stakes on [TaoMarketCap](https://taomarketcap.com/subnets/51/validators).
3467- **Step 2**: Ensure your executors are covering at least **50% of the total network stake**.
3468- **Step 3**: Use a **stake-weighted** strategy to assign your remaining executors, matching the proportion of the stake each validator controls.
3469- **Step 4**: Periodically recheck the stake distribution and adjust assignments as needed.
3470
3471By following this strategy, you’ll ensure that your executors are assigned to validators in the most efficient way possible, maximizing your chances of receiving rewards.
3472
3473## Additional Resources
3474
3475- [TaoMarketCap Subnet 51 Validators](https://taomarketcap.com/subnets/51/validators)
3476- [Compute Subnet Miner README](README.md)
3477
3478
3479
3480
3481---
3482File: /neurons/miners/docker_build.sh
3483---
3484
3485#!/bin/bash
3486set -eux -o pipefail
3487
3488IMAGE_NAME="daturaai/compute-subnet-miner:$TAG"
3489
3490docker build --build-context datura=../../datura -t $IMAGE_NAME .
3491
3492
3493---
3494File: /neurons/miners/docker_publish.sh
3495---
3496
3497#!/bin/bash
3498set -eux -o pipefail
3499
3500source ./docker_build.sh
3501
3502echo "$DOCKERHUB_PAT" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
3503docker push "$IMAGE_NAME"
3504
3505
3506---
3507File: /neurons/miners/docker_runner_build.sh
3508---
3509
3510#!/bin/bash
3511set -eux -o pipefail
3512
3513IMAGE_NAME="daturaai/compute-subnet-miner-runner:$TAG"
3514
3515docker build --file Dockerfile.runner -t $IMAGE_NAME .
3516
3517
3518---
3519File: /neurons/miners/docker_runner_publish.sh
3520---
3521
3522#!/bin/bash
3523set -eux -o pipefail
3524
3525source ./docker_runner_build.sh
3526
3527echo "$DOCKERHUB_PAT" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
3528docker push "$IMAGE_NAME"
3529
3530
3531---
3532File: /neurons/miners/entrypoint.sh
3533---
3534
3535#!/bin/sh
3536set -eu
3537
3538docker compose up --pull always --detach --wait --force-recreate
3539
3540# Clean docker images
3541docker image prune -f
3542
3543while true
3544do
3545    docker compose logs -f
3546    echo 'All containers died'
3547    sleep 10
3548done
3549
3550
3551
3552---
3553File: /neurons/miners/README.md
3554---
3555
3556# Miner
3557
3558## Overview
3559
3560This miner allows you to contribute your GPU resources to the Compute Subnet and earn compensation for providing computational power. You will run a central miner on a CPU server, which manages multiple executors running on GPU-equipped machines.
3561
3562### Central Miner Server Requirements
3563
3564To run the central miner, you only need a CPU server with the following specifications:
3565
3566- **CPU**: 4 cores
3567- **RAM**: 8GB
3568- **Storage**: 50GB available disk space
3569- **OS**: Ubuntu (recommended)
3570
3571### Executors
3572
3573Executors are GPU-equipped machines that perform the computational tasks. The central miner manages these executors, which can be easily added or removed from the network.
3574
3575To see the compatible GPUs to mine with and their relative rewards, see this dict [here](https://github.com/Datura-ai/compute-subnet/blob/main/neurons/validators/src/services/const.py#L3).
3576
3577## Installation
3578
3579### Using Docker
3580
3581#### Step 1: Clone the Git Repository
3582
3583```
3584git clone https://github.com/Datura-ai/compute-subnet.git
3585```
3586
3587#### Step 2: Install Required Tools
3588
3589```
3590cd compute-subnet && chmod +x scripts/install_miner_on_ubuntu.sh && ./scripts/install_miner_on_ubuntu.sh
3591```
3592
3593Verify if bittensor and docker installed: 
3594```
3595btcli --version
3596```
3597
3598```
3599docker --version
3600```
3601
3602If one of them isn't installed properly, install using following link:     
3603For bittensor, use [This Link](https://github.com/opentensor/bittensor/blob/master/README.md#install-bittensor-sdk)
3604For docker, use [This Link](https://docs.docker.com/engine/install/)
3605
3606#### Step 3: Setup ENV
3607```
3608cp neurons/miners/.env.template neurons/miners/.env
3609```
3610
3611Fill in your information for:
3612
3613`BITTENSOR_WALLET_NAME`: Your wallet name for Bittensor. You can check this with `btcli wallet list`
3614
3615`BITTENSOR_WALLET_HOTKEY_NAME`: The hotkey name of your wallet's registered hotkey. If it is not registered, run `btcli subnet register --netuid 51`. 
3616
3617`EXTERNAL_IP_ADDRESS`: The external IP address of your central miner server. Make sure it is open to external connections on the `EXTERNAL PORT`
3618
3619`HOST_WALLET_DIR`: The directory path of your wallet on the machine.
3620
3621`INTERNAL_PORT` and `EXTERNAL_PORT`: Optionally customize these ports. Make sure the `EXTERNAL PORT` is open for external connections to connect to the validators.
3622
3623
3624#### Step 4: Start the Miner
3625
3626```
3627cd neurons/miners && docker compose up -d
3628```
3629
3630## Managing Executors
3631
3632### Adding an Executor
3633
3634Executors are machines running on GPUs that you can add to your central miner. The more executors (GPUs) you have, the greater your compensation will be. Here's how to add them:
3635
36361. Ensure the executor machine is set up and running Docker. For more information, follow the [executor README.md here](../executor/README.md)
36372. Use the following command to add an executor to the central miner:
3638
3639    ```bash
3640    docker exec <container-id or name> python /root/app/src/cli.py add-executor --address <executor-ip-address> --port <executor-port> --validator <validator-hotkey>
3641    ```
3642
3643    - `<executor-ip-address>`: The IP address of the executor machine.
3644    - `<executor-port>`: The port number used for the executor (default: `8001`).
3645    - `<validator-hotkey>`: The validator hotkey that you want to give access to this executor. Which validator hotkey should you pick? Follow [this guide](assigning_validator_hotkeys.md)
3646
3647### What is a Validator Hotkey?
3648
3649The **validator hotkey** is a unique identifier tied to a validator that authenticates and verifies the performance of your executor machines. When you specify a validator hotkey during executor registration, it ensures that your executor is validated by this specific validator.
3650
3651To switch to a different validator first follow the instructions for removing an executor. After removing the executor, you need to re-register executors by running the add-executor command again (Step 2 of Adding an Executor).
3652
3653### Removing an Executor
3654
3655To remove an executor from the central miner, follow these steps:
3656
36571. Run the following command to remove the executor:
3658
3659    ```bash
3660    docker exec <docker instance> python /root/app/src/cli.py remove-executor --address <executor public ip> --port <executor external port>
3661    ```
3662
3663
3664### Monitoring earnings
3665
3666To monitor your earnings, use [Taomarketcap.com](https://taomarketcap.com/subnets/51/miners)'s subnet 51 miner page to track your daily rewards, and relative performance with other miners.
3667
3668
3669
3670---
3671File: /neurons/miners/run.sh
3672---
3673
3674#!/bin/sh
3675
3676# db migrate
3677alembic upgrade head
3678
3679# run fastapi app
3680python src/miner.py
3681
3682
3683---
3684File: /neurons/validators/migrations/versions/0653dc97382a_add_executors_table.py
3685---
3686
3687"""Add executors table
3688
3689Revision ID: 0653dc97382a
3690Revises: d5037a3f7b99
3691Create Date: 2024-09-10 09:42:38.878136
3692
3693"""
3694from typing import Sequence, Union
3695
3696from alembic import op
3697import sqlalchemy as sa
3698import sqlmodel
3699import sqlmodel.sql.sqltypes
3700
3701
3702# revision identifiers, used by Alembic.
3703revision: str = '0653dc97382a'
3704down_revision: Union[str, None] = 'd5037a3f7b99'
3705branch_labels: Union[str, Sequence[str], None] = None
3706depends_on: Union[str, Sequence[str], None] = None
3707
3708
3709def upgrade() -> None:
3710    # ### commands auto generated by Alembic - please adjust! ###
3711    op.create_table('executor',
3712    sa.Column('uuid', sa.Uuid(), nullable=False),
3713    sa.Column('miner_address', sqlmodel.sql.sqltypes.AutoString(), nullable=False),
3714    sa.Column('miner_port', sa.Integer(), nullable=False),
3715    sa.Column('miner_hotkey', sqlmodel.sql.sqltypes.AutoString(), nullable=False),
3716    sa.Column('executor_id', sa.Uuid(), nullable=False),
3717    sa.Column('executor_ip_address', sqlmodel.sql.sqltypes.AutoString(), nullable=False),
3718    sa.Column('executor_ssh_username', sqlmodel.sql.sqltypes.AutoString(), nullable=False),
3719    sa.Column('executor_ssh_port', sa.Integer(), nullable=False),
3720    sa.Column('rented', sa.Boolean(), nullable=True),
3721    sa.PrimaryKeyConstraint('uuid')
3722    )
3723    op.add_column('task', sa.Column('executor_id', sa.Uuid(), nullable=False))
3724    op.drop_column('task', 'ssh_private_key')
3725    # ### end Alembic commands ###
3726
3727
3728def downgrade() -> None:
3729    # ### commands auto generated by Alembic - please adjust! ###
3730    op.add_column('task', sa.Column('ssh_private_key', sa.VARCHAR(), autoincrement=False, nullable=False))
3731    op.drop_column('task', 'executor_id')
3732    op.drop_table('executor')
3733    # ### end Alembic commands ###
3734
3735
3736
3737---
3738File: /neurons/validators/migrations/versions/d5037a3f7b99_create_task_model.py
3739---
3740
3741"""create task model
3742
3743Revision ID: d5037a3f7b99
3744Revises: 
3745Create Date: 2024-08-19 17:57:42.735518
3746
3747"""
3748from typing import Sequence, Union
3749
3750from alembic import op
3751import sqlalchemy as sa
3752import sqlmodel
3753import sqlmodel.sql.sqltypes
3754
3755
3756# revision identifiers, used by Alembic.
3757revision: str = 'd5037a3f7b99'
3758down_revision: Union[str, None] = None
3759branch_labels: Union[str, Sequence[str], None] = None
3760depends_on: Union[str, Sequence[str], None] = None
3761
3762
3763def upgrade() -> None:
3764    # ### commands auto generated by Alembic - please adjust! ###
3765    op.create_table('task',
3766    sa.Column('uuid', sa.Uuid(), nullable=False),
3767    sa.Column('task_status', sa.Enum('Initiated', 'SSHConnected', 'Failed', 'Finished', name='taskstatus'), nullable=True),
3768    sa.Column('miner_hotkey', sqlmodel.sql.sqltypes.AutoString(), nullable=False),
3769    sa.Column('ssh_private_key', sqlmodel.sql.sqltypes.AutoString(), nullable=False),
3770    sa.Column('created_at', sa.DateTime(), nullable=False),
3771    sa.Column('proceed_time', sa.Integer(), nullable=True),
3772    sa.Column('score', sa.Float(), nullable=True),
3773    sa.PrimaryKeyConstraint('uuid')
3774    )
3775    # ### end Alembic commands ###
3776
3777
3778def downgrade() -> None:
3779    # ### commands auto generated by Alembic - please adjust! ###
3780    op.drop_table('task')
3781    # ### end Alembic commands ###
3782
3783
3784
3785---
3786File: /neurons/validators/migrations/env.py
3787---
3788
3789import os
3790from logging.config import fileConfig
3791from pathlib import Path
3792
3793from alembic import context
3794from dotenv import load_dotenv
3795from sqlalchemy import engine_from_config, pool
3796from sqlmodel import SQLModel
3797
3798from models.executor import *  # noqa
3799from models.task import *  # noqa
3800
3801# this is the Alembic Config object, which provides
3802# access to the values within the .ini file in use.
3803config = context.config
3804
3805# Interpret the config file for Python logging.
3806# This line sets up loggers basically.
3807if config.config_file_name is not None:
3808    fileConfig(config.config_file_name)
3809
3810# add your model's MetaData object here
3811# for 'autogenerate' support
3812# from myapp import mymodel
3813# target_metadata = mymodel.Base.metadata
3814
3815target_metadata = SQLModel.metadata
3816
3817# other values from the config, defined by the needs of env.py,
3818# can be acquired:
3819# my_important_option = config.get_main_option("my_important_option")
3820# ... etc.
3821
3822current_dir = Path(__file__).parent
3823
3824load_dotenv(str(current_dir / ".." / ".env"))
3825
3826
3827def get_url():
3828    url = os.getenv("SQLALCHEMY_DATABASE_URI")
3829    return url
3830
3831
3832def run_migrations_offline() -> None:
3833    """Run migrations in 'offline' mode.
3834
3835    This configures the context with just a URL
3836    and not an Engine, though an Engine is acceptable
3837    here as well.  By skipping the Engine creation
3838    we don't even need a DBAPI to be available.
3839
3840    Calls to context.execute() here emit the given string to the
3841    script output.
3842
3843    """
3844    url = get_url()
3845    context.configure(
3846        url=url,
3847        target_metadata=target_metadata,
3848        literal_binds=True,
3849        dialect_opts={"paramstyle": "named"},
3850    )
3851
3852    with context.begin_transaction():
3853        context.run_migrations()
3854
3855
3856def run_migrations_online() -> None:
3857    """Run migrations in 'online' mode.
3858
3859    In this scenario we need to create an Engine
3860    and associate a connection with the context.
3861
3862    """
3863    configuration = config.get_section(config.config_ini_section)
3864    configuration["sqlalchemy.url"] = get_url()
3865    connectable = engine_from_config(
3866        configuration,
3867        prefix="sqlalchemy.",
3868        poolclass=pool.NullPool,
3869    )
3870
3871    with connectable.connect() as connection:
3872        context.configure(connection=connection, target_metadata=target_metadata)
3873
3874        with context.begin_transaction():
3875            context.run_migrations()
3876
3877
3878if context.is_offline_mode():
3879    run_migrations_offline()
3880else:
3881    run_migrations_online()
3882
3883
3884
3885---
3886File: /neurons/validators/src/clients/__init__.py
3887---
3888
3889
3890
3891
3892---
3893File: /neurons/validators/src/clients/compute_client.py
3894---
3895
3896import asyncio
3897import json
3898import logging
3899from typing import NoReturn
3900
3901import bittensor
3902import pydantic
3903import redis.asyncio as aioredis
3904import tenacity
3905import websockets
3906from datura.requests.base import BaseRequest
3907from payload_models.payloads import (
3908    ContainerBaseRequest,
3909    ContainerCreated,
3910    ContainerCreateRequest,
3911    ContainerDeleteRequest,
3912    ContainerStartRequest,
3913    ContainerStopRequest,
3914    DuplicateExecutorsResponse,
3915    FailedContainerRequest,
3916)
3917from protocol.vc_protocol.compute_requests import Error, RentedMachineResponse, Response
3918from protocol.vc_protocol.validator_requests import (
3919    AuthenticateRequest,
3920    DuplicateExecutorsRequest,
3921    ExecutorSpecRequest,
3922    LogStreamRequest,
3923    RentedMachineRequest,
3924)
3925from pydantic import BaseModel
3926from websockets.asyncio.client import ClientConnection
3927
3928from clients.metagraph_client import create_metagraph_refresh_task, get_miner_axon_info
3929from core.utils import _m, get_extra_info
3930from services.miner_service import MinerService
3931from services.redis_service import (
3932    DUPLICATED_MACHINE_SET,
3933    MACHINE_SPEC_CHANNEL_NAME,
3934    RENTED_MACHINE_SET,
3935    STREAMING_LOG_CHANNEL,
3936)
3937
3938logger = logging.getLogger(__name__)
3939
3940
3941class AuthenticationError(Exception):
3942    def __init__(self, reason: str, errors: list[Error]):
3943        self.reason = reason
3944        self.errors = errors
3945
3946
3947class ComputeClient:
3948    HEARTBEAT_PERIOD = 60
3949
3950    def __init__(
3951        self, keypair: bittensor.Keypair, compute_app_uri: str, miner_service: MinerService
3952    ):
3953        self.keypair = keypair
3954        self.ws: ClientConnection | None = None
3955        self.compute_app_uri = compute_app_uri
3956        self.miner_drivers = asyncio.Queue()
3957        self.miner_driver_awaiter_task = asyncio.create_task(self.miner_driver_awaiter())
3958        # self.heartbeat_task = asyncio.create_task(self.heartbeat())
3959        self.refresh_metagraph_task = self.create_metagraph_refresh_task()
3960        self.miner_service = miner_service
3961
3962        self.logging_extra = get_extra_info(
3963            {
3964                "validator_hotkey": self.my_hotkey(),
3965                "compute_app_uri": compute_app_uri,
3966            }
3967        )
3968
3969    def accepted_request_type(self) -> type[BaseRequest]:
3970        return ContainerBaseRequest
3971
3972    def connect(self):
3973        """Create an awaitable/async-iterable websockets.connect() object"""
3974        logger.info(
3975            _m(
3976                "Connecting to backend app",
3977                extra=self.logging_extra,
3978            )
3979        )
3980        return websockets.connect(self.compute_app_uri)
3981
3982    async def miner_driver_awaiter(self):
3983        """avoid memory leak by awaiting miner driver tasks"""
3984        while True:
3985            task = await self.miner_drivers.get()
3986            if task is None:
3987                return
3988
3989            try:
3990                await task
3991            except Exception as exc:
3992                logger.error(
3993                    _m(
3994                        "Error occurred during driving a miner client",
3995                        extra={**self.logging_extra, "error": str(exc)},
3996                    )
3997                )
3998
3999    async def __aenter__(self):
4000        pass
4001
4002    async def __aexit__(self, exc_type, exc_val, exc_tb):
4003        await self.miner_drivers.put(None)
4004        await self.miner_driver_awaiter_task
4005
4006    def my_hotkey(self) -> str:
4007        return self.keypair.ss58_address
4008
4009    async def run_forever(self) -> NoReturn:
4010        """connect (and re-connect) to facilitator and keep reading messages ... forever"""
4011        try:
4012            # subscribe to channel to get machine specs
4013            pubsub = await self.miner_service.redis_service.subscribe(MACHINE_SPEC_CHANNEL_NAME)
4014            log_channel = await self.miner_service.redis_service.subscribe(STREAMING_LOG_CHANNEL)
4015
4016            # send machine specs to facilitator
4017            self.specs_task = asyncio.create_task(self.wait_for_specs(pubsub))
4018            asyncio.create_task(self.wait_for_log_streams(log_channel))
4019        except Exception as exc:
4020            logger.error(
4021                _m("redis connection error", extra={**self.logging_extra, "error": str(exc)})
4022            )
4023
4024        asyncio.create_task(self.poll_rented_machines())
4025
4026        try:
4027            while True:
4028                async for ws in self.connect():
4029                    try:
4030                        logger.info(
4031                            _m(
4032                                "Connected to backend app",
4033                                extra=self.logging_extra,
4034                            )
4035                        )
4036                        await self.handle_connection(ws)
4037                    except websockets.ConnectionClosed as exc:
4038                        self.ws = None
4039                        logger.warning(
4040                            _m(
4041                                f"validator connection to backend app closed with code {exc.code} and reason {exc.reason}, reconnecting...",
4042                                extra=self.logging_extra,
4043                            )
4044                        )
4045                    except asyncio.exceptions.CancelledError:
4046                        self.ws = None
4047                        logger.warning(
4048                            _m(
4049                                "Facilitator client received cancel, stopping",
4050                                extra=self.logging_extra,
4051                            )
4052                        )
4053                    except Exception:
4054                        self.ws = None
4055                        logger.error(
4056                            _m(
4057                                "Error in connecting to compute app",
4058                                extra=self.logging_extra,
4059                            )
4060                        )
4061
4062        except Exception as exc:
4063            self.ws = None
4064            logger.error(
4065                _m(
4066                    "Connecting to compute app failed",
4067                    extra={**self.logging_extra, "error": str(exc)},
4068                ),
4069                exc_info=True,
4070            )
4071
4072    async def handle_connection(self, ws: ClientConnection):
4073        """handle a single websocket connection"""
4074        await ws.send(AuthenticateRequest.from_keypair(self.keypair).model_dump_json())
4075
4076        raw_msg = await ws.recv()
4077        try:
4078            response = Response.model_validate_json(raw_msg)
4079        except pydantic.ValidationError as exc:
4080            raise AuthenticationError(
4081                "did not receive Response for AuthenticationRequest", []
4082            ) from exc
4083        if response.status != "success":
4084            raise AuthenticationError("auth request received failed response", response.errors)
4085
4086        self.ws = ws
4087
4088        async for raw_msg in ws:
4089            await self.handle_message(raw_msg)
4090
4091    async def wait_for_specs(self, channel: aioredis.client.PubSub):
4092        specs_queue = []
4093        while True:
4094            validator_hotkey = self.my_hotkey()
4095
4096            logger.info(
4097                _m(
4098                    f"Waiting for machine specs from validator app: {validator_hotkey}",
4099                    extra=self.logging_extra,
4100                )
4101            )
4102            try:
4103                msg = await channel.get_message(ignore_subscribe_messages=True, timeout=100 * 60)
4104                logger.info(
4105                    _m(
4106                        "Received machine specs from validator app.",
4107                        extra={**self.logging_extra},
4108                    )
4109                )
4110
4111                if msg is None:
4112                    logger.warning(
4113                        _m(
4114                            "No message received from validator app.",
4115                            extra=self.logging_extra,
4116                        )
4117                    )
4118                    continue
4119
4120                msg = json.loads(msg["data"])
4121                specs = None
4122                executor_logging_extra = {}
4123                try:
4124                    specs = ExecutorSpecRequest(
4125                        specs=msg["specs"],
4126                        score=msg["score"],
4127                        synthetic_job_score=msg["synthetic_job_score"],
4128                        log_status=msg["log_status"],
4129                        job_batch_id=msg["job_batch_id"],
4130                        log_text=msg["log_text"],
4131                        miner_hotkey=msg["miner_hotkey"],
4132                        validator_hotkey=validator_hotkey,
4133                        executor_uuid=msg["executor_uuid"],
4134                        executor_ip=msg["executor_ip"],
4135                        executor_port=msg["executor_port"],
4136                    )
4137                    executor_logging_extra = {
4138                        "executor_uuid": msg["executor_uuid"],
4139                        "executor_ip": msg["executor_ip"],
4140                        "executor_port": msg["executor_port"],
4141                        "job_batch_id": msg["job_batch_id"],
4142                    }
4143                except Exception as exc:
4144                    msg = "Error occurred while parsing msg"
4145                    logger.error(
4146                        _m(
4147                            msg,
4148                            extra={
4149                                **self.logging_extra,
4150                                **executor_logging_extra,
4151                                "error": str(exc),
4152                            },
4153                        )
4154                    )
4155                    continue
4156
4157                logger.info(
4158                    "Sending machine specs update of executor to compute app",
4159                    extra={**self.logging_extra, **executor_logging_extra, "specs": str(specs)},
4160                )
4161
4162                specs_queue.append(specs)
4163                if self.ws is not None:
4164                    while len(specs_queue) > 0:
4165                        spec_to_send = specs_queue.pop(0)
4166                        try:
4167                            await self.send_model(spec_to_send)
4168                        except Exception as exc:
4169                            specs_queue.insert(0, spec_to_send)
4170                            msg = "Error occurred while sending specs of executor"
4171                            logger.error(
4172                                _m(
4173                                    msg,
4174                                    extra={
4175                                        **self.logging_extra,
4176                                        **executor_logging_extra,
4177                                        "error": str(exc),
4178                                    },
4179                                )
4180                            )
4181                            break
4182            except TimeoutError:
4183                logger.error(
4184                    _m(
4185                        "wait_for_specs still running",
4186                        extra=self.logging_extra,
4187                    )
4188                )
4189
4190    async def wait_for_log_streams(self, channel: aioredis.client.PubSub):
4191        logs_queue: list[LogStreamRequest] = []
4192        while True:
4193            validator_hotkey = self.my_hotkey()
4194            logger.info(
4195                _m(
4196                    f"Waiting for log streams: {validator_hotkey}",
4197                    extra=self.logging_extra,
4198                )
4199            )
4200            try:
4201                msg = await channel.get_message(ignore_subscribe_messages=True, timeout=100 * 60)
4202                if msg is None:
4203                    logger.warning(
4204                        _m(
4205                            "No log streams yet",
4206                            extra=self.logging_extra,
4207                        )
4208                    )
4209                    continue
4210
4211                msg = json.loads(msg["data"])
4212                log_stream = None
4213
4214                try:
4215                    log_stream = LogStreamRequest(
4216                        logs=msg["logs"],
4217                        miner_hotkey=msg["miner_hotkey"],
4218                        validator_hotkey=validator_hotkey,
4219                        executor_uuid=msg["executor_uuid"],
4220                    )
4221
4222                    logger.info(
4223                        _m(
4224                            f'Successfully created LogStreamRequest instance with {len(msg["logs"])} logs',
4225                            extra=self.logging_extra,
4226                        )
4227                    )
4228                except Exception as exc:
4229                    logger.error(
4230                        _m(
4231                            "Failed to get LogStreamRequest instance",
4232                            extra={
4233                                **self.logging_extra,
4234                                "error": str(exc),
4235                                "msg": str(msg),
4236                            },
4237                        )
4238                    )
4239                    continue
4240
4241                logs_queue.append(log_stream)
4242                if self.ws is not None:
4243                    while len(logs_queue) > 0:
4244                        log_to_send = logs_queue.pop(0)
4245                        try:
4246                            await self.send_model(log_to_send)
4247                        except Exception as exc:
4248                            logs_queue.insert(0, log_to_send)
4249                            logger.error(
4250                                _m(
4251                                    msg,
4252                                    extra={
4253                                        **self.logging_extra,
4254                                        "error": str(exc),
4255                                    },
4256                                )
4257                            )
4258                            break
4259            except TimeoutError:
4260                pass
4261
4262    def create_metagraph_refresh_task(self, period=None):
4263        return create_metagraph_refresh_task(period=period)
4264
4265    async def heartbeat(self):
4266        pass
4267        # while True:
4268        #     if self.ws is not None:
4269        #         try:
4270        #             await self.send_model(Heartbeat())
4271        #         except Exception as exc:
4272        #             msg = f"Error occurred while sending heartbeat: {exc}"
4273        #             logger.warning(msg)
4274        #     await asyncio.sleep(self.HEARTBEAT_PERIOD)
4275
4276    @tenacity.retry(
4277        stop=tenacity.stop_after_attempt(7),
4278        wait=tenacity.wait_exponential(multiplier=1, exp_base=2, min=1, max=10),
4279        retry=tenacity.retry_if_exception_type(websockets.ConnectionClosed),
4280    )
4281    async def send_model(self, msg: BaseModel):
4282        if self.ws is None:
4283            raise websockets.ConnectionClosed(rcvd=None, sent=None)
4284        await self.ws.send(msg.model_dump_json())
4285        # Summary: https://github.com/python-websockets/websockets/issues/867
4286        # Longer discussion: https://github.com/python-websockets/websockets/issues/865
4287        await asyncio.sleep(0)
4288
4289    async def poll_rented_machines(self):
4290        while True:
4291            if self.ws is not None:
4292                logger.info(
4293                    _m(
4294                        "Request rented machines",
4295                        extra=self.logging_extra,
4296                    )
4297                )
4298                await self.send_model(RentedMachineRequest())
4299
4300                logger.info(
4301                    _m(
4302                        "Request duplicated machines",
4303                        extra=self.logging_extra,
4304                    )
4305                )
4306                await self.send_model(DuplicateExecutorsRequest())
4307
4308                await asyncio.sleep(10 * 60)
4309            else:
4310                await asyncio.sleep(10)
4311
4312    async def handle_message(self, raw_msg: str | bytes):
4313        """handle message received from facilitator"""
4314        try:
4315            response = Response.model_validate_json(raw_msg)
4316        except pydantic.ValidationError:
4317            pass
4318        else:
4319            if response.status != "success":
4320                logger.error(
4321                    _m(
4322                        "received error response from facilitator",
4323                        extra={**self.logging_extra, "response": str(response)},
4324                    )
4325                )
4326            return
4327
4328        try:
4329            response = pydantic.TypeAdapter(RentedMachineResponse).validate_json(raw_msg)
4330        except pydantic.ValidationError:
4331            pass
4332        else:
4333            logger.info(
4334                _m(
4335                    "Rented machines",
4336                    extra={**self.logging_extra, "machines": len(response.machines)},
4337                )
4338            )
4339
4340            redis_service = self.miner_service.redis_service
4341            await redis_service.delete(RENTED_MACHINE_SET)
4342
4343            for machine in response.machines:
4344                await redis_service.add_rented_machine(machine)
4345
4346            return
4347
4348        try:
4349            response = pydantic.TypeAdapter(DuplicateExecutorsResponse).validate_json(raw_msg)
4350        except pydantic.ValidationError:
4351            pass
4352        else:
4353            logger.info(
4354                _m(
4355                    "Duplicated executors",
4356                    extra={**self.logging_extra, "executors": len(response.executors)},
4357                )
4358            )
4359
4360            redis_service = self.miner_service.redis_service
4361            await redis_service.delete(DUPLICATED_MACHINE_SET)
4362
4363            for _, details_list in response.executors.items():
4364                for detail in details_list:
4365                    executor_id = detail.get("executor_id")
4366                    miner_hotkey = detail.get("miner_hotkey")
4367                    await redis_service.sadd(
4368                        DUPLICATED_MACHINE_SET, f"{miner_hotkey}:{executor_id}"
4369                    )
4370
4371            return
4372
4373        try:
4374            job_request = self.accepted_request_type().parse(raw_msg)
4375        except Exception as ex:
4376            error_msg = f"Invalid message received from celium backend: {str(ex)}"
4377            logger.error(
4378                _m(
4379                    error_msg,
4380                    extra={**self.logging_extra, "error": str(ex), "raw_msg": raw_msg},
4381                )
4382            )
4383        else:
4384            task = asyncio.create_task(self.miner_driver(job_request))
4385            await self.miner_drivers.put(task)
4386            return
4387        # logger.error("unsupported message received from facilitator: %s", raw_msg)
4388
4389    async def get_miner_axon_info(self, hotkey: str) -> bittensor.AxonInfo:
4390        return await get_miner_axon_info(hotkey)
4391
4392    async def miner_driver(
4393        self,
4394        job_request: ContainerCreateRequest
4395        | ContainerDeleteRequest
4396        | ContainerStopRequest
4397        | ContainerStartRequest,
4398    ):
4399        """drive a miner client from job start to completion, then close miner connection"""
4400        miner_axon_info = await self.get_miner_axon_info(job_request.miner_hotkey)
4401        logging_extra = {
4402            **self.logging_extra,
4403            "miner_hotkey": job_request.miner_hotkey,
4404            "miner_ip": miner_axon_info.ip,
4405            "miner_port": miner_axon_info.port,
4406            "job_request": str(job_request),
4407            "executor_id": str(job_request.executor_id),
4408        }
4409        logger.info(
4410            _m(
4411                "Miner driver to miner",
4412                extra=logging_extra,
4413            )
4414        )
4415
4416        if isinstance(job_request, ContainerCreateRequest):
4417            logger.info(
4418                _m(
4419                    "Creating container for executor.",
4420                    extra={**logging_extra, "job_request": str(job_request)},
4421                )
4422            )
4423            job_request.miner_address = miner_axon_info.ip
4424            job_request.miner_port = miner_axon_info.port
4425            container_created: (
4426                ContainerCreated | FailedContainerRequest
4427            ) = await self.miner_service.handle_container(job_request)
4428
4429            logger.info(
4430                _m(
4431                    "Sending back created container info to compute app",
4432                    extra={**logging_extra, "container_created": str(container_created)},
4433                )
4434            )
4435            await self.send_model(container_created)
4436        elif isinstance(job_request, ContainerDeleteRequest):
4437            job_request.miner_address = miner_axon_info.ip
4438            job_request.miner_port = miner_axon_info.port
4439            response: (
4440                ContainerDeleteRequest | FailedContainerRequest
4441            ) = await self.miner_service.handle_container(job_request)
4442
4443            logger.info(
4444                _m(
4445                    "Sending back deleted container info to compute app",
4446                    extra={**logging_extra, "response": str(response)},
4447                )
4448            )
4449            await self.send_model(response)
4450        elif isinstance(job_request, ContainerStopRequest):
4451            job_request.miner_address = miner_axon_info.ip
4452            job_request.miner_port = miner_axon_info.port
4453            response: (
4454                ContainerStopRequest | FailedContainerRequest
4455            ) = await self.miner_service.handle_container(job_request)
4456
4457            logger.info(
4458                _m(
4459                    "Sending back stopped container info to compute app",
4460                    extra={**logging_extra, "response": str(response)},
4461                )
4462            )
4463            await self.send_model(response)
4464        elif isinstance(job_request, ContainerStartRequest):
4465            job_request.miner_address = miner_axon_info.ip
4466            job_request.miner_port = miner_axon_info.port
4467            response: (
4468                ContainerStartRequest | FailedContainerRequest
4469            ) = await self.miner_service.handle_container(job_request)
4470
4471            logger.info(
4472                _m(
4473                    "Sending back started container info to compute app",
4474                    extra={**logging_extra, "response": str(response)},
4475                )
4476            )
4477            await self.send_model(response)
4478
4479
4480
4481---
4482File: /neurons/validators/src/clients/metagraph_client.py
4483---
4484
4485import asyncio
4486import datetime as dt
4487import logging
4488
4489import bittensor
4490from asgiref.sync import sync_to_async
4491
4492from core.config import settings
4493
4494logger = logging.getLogger(__name__)
4495
4496
4497class AsyncMetagraphClient:
4498    def __init__(self, cache_time=dt.timedelta(minutes=5)):
4499        self.cache_time = cache_time
4500        self._metagraph_future = None
4501        self._future_lock = asyncio.Lock()
4502        self._cached_metagraph = None
4503        self._cache_timestamp = None
4504        self.config = settings.get_bittensor_config()
4505
4506    async def get_metagraph(self, ignore_cache=False):
4507        future = None
4508        set_result = False
4509        if self._cached_metagraph is not None:
4510            if not ignore_cache and dt.datetime.now() - self._cache_timestamp < self.cache_time:
4511                return self._cached_metagraph
4512        async with self._future_lock:
4513            if self._metagraph_future is None:
4514                loop = asyncio.get_running_loop()
4515                future = self._metagraph_future = loop.create_future()
4516                set_result = True
4517            else:
4518                future = self._metagraph_future
4519        if set_result:
4520            try:
4521                result = await self._get_metagraph()
4522            except Exception as exc:
4523                future.set_exception(exc)
4524                raise
4525            else:
4526                future.set_result(result)
4527                self._cache_timestamp = dt.datetime.now()
4528                self._cached_metagraph = result
4529                return result
4530            finally:
4531                async with self._future_lock:
4532                    self._metagraph_future = None
4533        else:
4534            return await future
4535
4536    def _get_subtensor(self):
4537        return bittensor.subtensor(config=self.config)
4538
4539    @sync_to_async(thread_sensitive=False)
4540    def _get_metagraph(self):
4541        return self._get_subtensor().metagraph(netuid=settings.BITTENSOR_NETUID)
4542
4543    async def periodic_refresh(self, period=None):
4544        if period is None:
4545            period = self.cache_time.total_seconds()
4546        while True:
4547            try:
4548                await self.get_metagraph(ignore_cache=True)
4549            except Exception as exc:
4550                msg = f"Failed to refresh metagraph: {exc}"
4551                logger.warning(msg)
4552
4553            await asyncio.sleep(period)
4554
4555
4556async_metagraph_client = AsyncMetagraphClient()
4557
4558
4559async def get_miner_axon_info(hotkey: str) -> bittensor.AxonInfo:
4560    metagraph = await async_metagraph_client.get_metagraph()
4561    neurons = [n for n in metagraph.neurons if n.hotkey == hotkey]
4562    if not neurons:
4563        raise ValueError(f"Miner with {hotkey=} not present in this subnetwork")
4564    return neurons[0].axon_info
4565
4566
4567def create_metagraph_refresh_task(period=None):
4568    return asyncio.create_task(async_metagraph_client.periodic_refresh(period=period))
4569
4570
4571
4572---
4573File: /neurons/validators/src/clients/miner_client.py
4574---
4575
4576import abc
4577import asyncio
4578import logging
4579import random
4580import time
4581
4582import bittensor
4583import websockets
4584from websockets.asyncio.client import ClientConnection
4585from websockets.protocol import State as WebSocketClientState
4586from datura.errors.protocol import UnsupportedMessageReceived
4587from datura.requests.base import BaseRequest
4588from datura.requests.miner_requests import (
4589    AcceptJobRequest,
4590    AcceptSSHKeyRequest,
4591    BaseMinerRequest,
4592    DeclineJobRequest,
4593    FailedRequest,
4594    GenericError,
4595    SSHKeyRemoved,
4596    UnAuthorizedRequest,
4597)
4598from datura.requests.validator_requests import AuthenticateRequest, AuthenticationPayload
4599
4600from core.utils import _m, get_extra_info
4601
4602logger = logging.getLogger(__name__)
4603
4604
4605class JobState:
4606    def __init__(self):
4607        self.miner_ready_or_declining_future = asyncio.Future()
4608        self.miner_ready_or_declining_timestamp: int = 0
4609        self.miner_accepted_ssh_key_or_failed_future = asyncio.Future()
4610        self.miner_accepted_ssh_key_or_failed_timestamp: int = 0
4611        self.miner_removed_ssh_key_future = asyncio.Future()
4612
4613
4614class MinerClient(abc.ABC):
4615    def __init__(
4616        self,
4617        loop: asyncio.AbstractEventLoop,
4618        miner_address: str,
4619        my_hotkey: str,
4620        miner_hotkey: str,
4621        miner_port: int,
4622        keypair: bittensor.Keypair,
4623        miner_url: str,
4624    ):
4625        self.debounce_counter = 0
4626        self.max_debounce_count: int | None = 5  # set to None for unlimited debounce
4627        self.loop = loop
4628        self.miner_name = f"{miner_hotkey}({miner_address}:{miner_port})"
4629        self.ws: ClientConnection | None = None
4630        self.read_messages_task: asyncio.Task | None = None
4631        self.deferred_send_tasks: list[asyncio.Task] = []
4632
4633        self.miner_hotkey = miner_hotkey
4634        self.my_hotkey = my_hotkey
4635        self.miner_address = miner_address
4636        self.miner_port = miner_port
4637        self.keypair = keypair
4638
4639        self.miner_url = miner_url
4640
4641        self.job_state = JobState()
4642
4643        self.logging_extra = {
4644            "miner_hotkey": miner_hotkey,
4645            "miner_address": miner_address,
4646            "miner_port": miner_port,
4647        }
4648
4649    def accepted_request_type(self) -> type[BaseRequest]:
4650        return BaseMinerRequest
4651
4652    async def handle_message(self, msg: BaseRequest):
4653        """
4654        Handle the message based on its type or raise UnsupportedMessageReceived
4655        """
4656        if isinstance(msg, AcceptJobRequest):
4657            if not self.job_state.miner_ready_or_declining_future.done():
4658                self.job_state.miner_ready_or_declining_timestamp = time.time()
4659                self.job_state.miner_ready_or_declining_future.set_result(msg)
4660        elif isinstance(
4661            msg, AcceptSSHKeyRequest | FailedRequest | UnAuthorizedRequest | DeclineJobRequest
4662        ):
4663            if not self.job_state.miner_accepted_ssh_key_or_failed_future.done():
4664                self.job_state.miner_accepted_ssh_key_or_failed_timestamp = time.time()
4665                self.job_state.miner_accepted_ssh_key_or_failed_future.set_result(msg)
4666        elif isinstance(msg, SSHKeyRemoved):
4667            if not self.job_state.miner_removed_ssh_key_future.done():
4668                self.job_state.miner_removed_ssh_key_future.set_result(msg)
4669
4670    async def __aenter__(self):
4671        await self.await_connect()
4672
4673    async def __aexit__(self, exc_type, exc_val, exc_tb):
4674        for t in self.deferred_send_tasks:
4675            t.cancel()
4676
4677        if self.read_messages_task is not None and not self.read_messages_task.done():
4678            self.read_messages_task.cancel()
4679
4680        if self.ws is not None and self.ws.state is WebSocketClientState.OPEN:
4681            try:
4682                await self.ws.close()
4683            except Exception:
4684                pass
4685
4686    def generate_authentication_message(self) -> AuthenticateRequest:
4687        """Generate authentication request/message for miner."""
4688        payload = AuthenticationPayload(
4689            validator_hotkey=self.my_hotkey,
4690            miner_hotkey=self.miner_hotkey,
4691            timestamp=int(time.time()),
4692        )
4693        return AuthenticateRequest(
4694            payload=payload, signature=f"0x{self.keypair.sign(payload.blob_for_signing()).hex()}"
4695        )
4696
4697    async def _connect(self):
4698        ws = await websockets.connect(self.miner_url, max_size=50 * (2**20))  # 50MB
4699        await ws.send(self.generate_authentication_message().json())
4700        return ws
4701
4702    async def await_connect(self):
4703        start_time = time.time()
4704        while True:
4705            try:
4706                if (
4707                    self.max_debounce_count is not None
4708                    and self.debounce_counter > self.max_debounce_count
4709                ):
4710                    time_took = time.time() - start_time
4711                    raise Exception(
4712                        f"Could not connect to miner {self.miner_name} after {self.max_debounce_count} tries"
4713                        f" in {time_took:0.2f} seconds"
4714                    )
4715                if self.debounce_counter:
4716                    sleep_time = self.sleep_time()
4717                    logger.info(
4718                        _m(
4719                            f"Retrying connection to miner in {sleep_time:0.2f}",
4720                            extra=get_extra_info(self.logging_extra)
4721                        )
4722                    )
4723                    await asyncio.sleep(sleep_time)
4724                self.ws = await self._connect()
4725                self.read_messages_task = self.loop.create_task(self.read_messages())
4726
4727                if self.debounce_counter:
4728                    logger.info(
4729                        _m(
4730                            f"Connected to miner after {self.debounce_counter + 1} attempts",
4731                            extra=get_extra_info(self.logging_extra),
4732                        )
4733                    )
4734                return
4735            except (websockets.WebSocketException, OSError) as ex:
4736                self.debounce_counter += 1
4737                logger.error(
4738                    _m(
4739                        f"Could not connect to miner: {str(ex)}",
4740                        extra=get_extra_info(
4741                            {**self.logging_extra, "debounce_counter": self.debounce_counter}
4742                        ),
4743                    )
4744                )
4745
4746    def sleep_time(self):
4747        return (2**self.debounce_counter) + random.random()
4748
4749    async def ensure_connected(self):
4750        if self.ws is None or self.ws.state is not WebSocketClientState.OPEN:
4751            if self.read_messages_task is not None and not self.read_messages_task.done():
4752                self.read_messages_task.cancel()
4753            await self.await_connect()
4754
4755    async def send_model(self, model: BaseRequest):
4756        while True:
4757            await self.ensure_connected()
4758            try:
4759                await self.ws.send(model.json())
4760            except websockets.WebSocketException:
4761                logger.error(
4762                    _m(
4763                        "Could not send to miner. Retrying 1+ seconds later...",
4764                        extra=get_extra_info({**self.logging_extra, "model": str(model)}),
4765                    )
4766                )
4767                await asyncio.sleep(1 + random.random())
4768                continue
4769            return
4770
4771    def deferred_send_model(self, model: BaseRequest):
4772        task = self.loop.create_task(self.send_model(model))
4773        self.deferred_send_tasks.append(task)
4774
4775    async def read_messages(self):
4776        while True:
4777            try:
4778                msg = await self.ws.recv()
4779            except websockets.WebSocketException as ex:
4780                self.debounce_counter += 1
4781                logger.error(
4782                    _m(
4783                        "Connection to miner lost",
4784                        extra=get_extra_info(
4785                            {
4786                                **self.logging_extra,
4787                                "debounce_counter": self.debounce_counter,
4788                                "error": str(ex),
4789                            }
4790                        ),
4791                    )
4792                )
4793                self.loop.create_task(self.await_connect())
4794                return
4795
4796            try:
4797                msg = self.accepted_request_type().parse(msg)
4798            except Exception as ex:
4799                error_msg = f"Malformed message from miner: {str(ex)}"
4800                logger.error(
4801                    _m(
4802                        error_msg,
4803                        extra=get_extra_info({**self.logging_extra, "error": str(ex)}),
4804                    )
4805                )
4806                continue
4807
4808            if isinstance(msg, GenericError):
4809                try:
4810                    await self.ws.close()
4811                    raise RuntimeError(f"Received error message from miner: {msg.json()}")
4812                except Exception:
4813                    logger.error(
4814                        _m(
4815                            "Error closing websocket connection",
4816                            extra=get_extra_info(self.logging_extra),
4817                        )
4818                    )
4819                continue
4820
4821            try:
4822                await self.handle_message(msg)
4823            except UnsupportedMessageReceived:
4824                error_msg = "Unsupported message from miner"
4825                logger.error(_m(error_msg, extra=get_extra_info(self.logging_extra)))
4826            else:
4827                if self.debounce_counter:
4828                    logger.info(
4829                        _m(
4830                            f"Receviced valid message from miner after {self.debounce_counter + 1} connection attempts",
4831                            extra=get_extra_info(self.logging_extra),
4832                        )
4833
4834                    )
4835                    self.debounce_counter = 0
4836
4837
4838
4839---
4840File: /neurons/validators/src/core/__init__.py
4841---
4842
4843
4844
4845
4846---
4847File: /neurons/validators/src/core/config.py
4848---
4849
4850import argparse
4851import pathlib
4852from typing import TYPE_CHECKING
4853
4854import bittensor
4855from pydantic import Field
4856from pydantic_settings import BaseSettings, SettingsConfigDict
4857
4858if TYPE_CHECKING:
4859    from bittensor_wallet import Wallet
4860
4861
4862class Settings(BaseSettings):
4863    model_config = SettingsConfigDict(env_file=".env", extra="ignore")
4864    PROJECT_NAME: str = "compute-subnet-validator"
4865
4866    BITTENSOR_WALLET_DIRECTORY: pathlib.Path = Field(
4867        env="BITTENSOR_WALLET_DIRECTORY",
4868        default=pathlib.Path("~").expanduser() / ".bittensor" / "wallets",
4869    )
4870    BITTENSOR_WALLET_NAME: str = Field(env="BITTENSOR_WALLET_NAME")
4871    BITTENSOR_WALLET_HOTKEY_NAME: str = Field(env="BITTENSOR_WALLET_HOTKEY_NAME")
4872    BITTENSOR_NETUID: int = Field(env="BITTENSOR_NETUID")
4873    BITTENSOR_CHAIN_ENDPOINT: str | None = Field(env="BITTENSOR_CHAIN_ENDPOINT", default=None)
4874    BITTENSOR_NETWORK: str = Field(env="BITTENSOR_NETWORK")
4875
4876    SQLALCHEMY_DATABASE_URI: str = Field(env="SQLALCHEMY_DATABASE_URI")
4877    ASYNC_SQLALCHEMY_DATABASE_URI: str = Field(env="ASYNC_SQLALCHEMY_DATABASE_URI")
4878    DEBUG: bool = Field(env="DEBUG", default=False)
4879    DEBUG_MINER_HOTKEY: str = Field(env="DEBUG_MINER_HOTKEY", default="")
4880    DEBUG_MINER_ADDRESS: str | None = Field(env="DEBUG_MINER_ADDRESS", default=None)
4881    DEBUG_MINER_PORT: int | None = Field(env="DEBUG_MINER_PORT", default=None)
4882
4883    INTERNAL_PORT: int = Field(env="INTERNAL_PORT", default=8000)
4884    BLOCKS_FOR_JOB: int = 50
4885
4886    REDIS_HOST: str = Field(env="REDIS_HOST", default="localhost")
4887    REDIS_PORT: int = Field(env="REDIS_PORT", default=6379)
4888    COMPUTE_APP_URI: str = "wss://celiumcompute.ai"
4889    COMPUTE_REST_API_URL: str | None = Field(
4890        env="COMPUTE_REST_API_URL", default="https://celiumcompute.ai/api"
4891    )
4892
4893    ENV: str = Field(env="ENV", default="dev")
4894    
4895    # Read version from version.txt
4896    VERSION: str = (pathlib.Path(__file__).parent / ".." / ".." / "version.txt").read_text().strip()
4897
4898    def get_bittensor_wallet(self) -> "Wallet":
4899        if not self.BITTENSOR_WALLET_NAME or not self.BITTENSOR_WALLET_HOTKEY_NAME:
4900            raise RuntimeError("Wallet not configured")
4901        wallet = bittensor.wallet(
4902            name=self.BITTENSOR_WALLET_NAME,
4903            hotkey=self.BITTENSOR_WALLET_HOTKEY_NAME,
4904            path=str(self.BITTENSOR_WALLET_DIRECTORY),
4905        )
4906        wallet.hotkey_file.get_keypair()  # this raises errors if the keys are inaccessible
4907        return wallet
4908
4909    def get_bittensor_config(self) -> bittensor.config:
4910        parser = argparse.ArgumentParser()
4911        # bittensor.wallet.add_args(parser)
4912        # bittensor.subtensor.add_args(parser)
4913        # bittensor.axon.add_args(parser)
4914
4915        if self.BITTENSOR_NETWORK:
4916            if "--subtensor.network" in parser._option_string_actions:
4917                parser._handle_conflict_resolve(
4918                    None,
4919                    [("--subtensor.network", parser._option_string_actions["--subtensor.network"])],
4920                )
4921
4922            parser.add_argument(
4923                "--subtensor.network",
4924                type=str,
4925                help="network",
4926                default=self.BITTENSOR_NETWORK,
4927            )
4928
4929        if self.BITTENSOR_CHAIN_ENDPOINT:
4930            if "--subtensor.chain_endpoint" in parser._option_string_actions:
4931                parser._handle_conflict_resolve(
4932                    None,
4933                    [
4934                        (
4935                            "--subtensor.chain_endpoint",
4936                            parser._option_string_actions["--subtensor.chain_endpoint"],
4937                        )
4938                    ],
4939                )
4940
4941            parser.add_argument(
4942                "--subtensor.chain_endpoint",
4943                type=str,
4944                help="chain endpoint",
4945                default=self.BITTENSOR_CHAIN_ENDPOINT,
4946            )
4947
4948        return bittensor.config(parser)
4949
4950    def get_debug_miner(self) -> dict:
4951        if not self.DEBUG_MINER_ADDRESS or not self.DEBUG_MINER_PORT:
4952            raise RuntimeError("Debug miner not configured")
4953
4954        miner = type("Miner", (object,), {})()
4955        miner.hotkey = self.DEBUG_MINER_HOTKEY
4956        miner.axon_info = type("AxonInfo", (object,), {})()
4957        miner.axon_info.ip = self.DEBUG_MINER_ADDRESS
4958        miner.axon_info.port = self.DEBUG_MINER_PORT
4959        return miner
4960
4961
4962settings = Settings()
4963
4964
4965
4966---
4967File: /neurons/validators/src/core/db.py
4968---
4969
4970from collections.abc import AsyncGenerator
4971from typing import Annotated
4972
4973from fastapi import Depends
4974from sqlalchemy.ext.asyncio import create_async_engine
4975from sqlalchemy.orm import sessionmaker
4976from sqlmodel.ext.asyncio.session import AsyncSession
4977
4978from core.config import settings
4979
4980engine = create_async_engine(str(settings.ASYNC_SQLALCHEMY_DATABASE_URI), echo=True, future=True)
4981
4982
4983async def get_db() -> AsyncGenerator[AsyncSession, None]:
4984    async_session = sessionmaker(bind=engine, class_=AsyncSession, expire_on_commit=False)
4985    async with async_session() as session:
4986        yield session
4987
4988
4989SessionDep = Annotated[AsyncSession, Depends(get_db)]
4990
4991
4992
4993---
4994File: /neurons/validators/src/core/utils.py
4995---
4996
4997import asyncio
4998import contextvars
4999import json
5000import logging
5001from logging.config import dictConfig  # noqa
5002
5003from core.config import settings
5004
5005logger = logging.getLogger(__name__)
5006
5007# Create a ContextVar to hold the context information
5008context = contextvars.ContextVar("context", default="TaskService")
5009context.set("TaskService")
5010
5011
5012def wait_for_services_sync(timeout=30):
5013    """Wait until Redis and PostgreSQL connections are working."""
5014    import time
5015
5016    import psycopg2
5017    from redis import Redis
5018    from redis.exceptions import ConnectionError as RedisConnectionError
5019
5020    from core.config import settings
5021
5022    # Initialize Redis client
5023    redis_client = Redis(host=settings.REDIS_HOST, port=settings.REDIS_PORT)
5024
5025    start_time = time.time()
5026
5027    logger.info("Waiting for services to be available...")
5028
5029    while True:
5030        try:
5031            # Check Redis connection
5032            redis_client.ping()
5033            logger.info("Connected to Redis.")
5034
5035            # Check PostgreSQL connection using SQLAlchemy
5036            from sqlalchemy import create_engine, text
5037            from sqlalchemy.exc import SQLAlchemyError
5038
5039            engine = create_engine(settings.SQLALCHEMY_DATABASE_URI)
5040            try:
5041                with engine.connect() as connection:
5042                    connection.execute(text("SELECT 1"))
5043                logger.info("Connected to PostgreSQL.")
5044            except SQLAlchemyError as e:
5045                logger.error("Failed to connect to PostgreSQL.")
5046                raise e
5047
5048            break  # Exit loop if both connections are successful
5049        except (psycopg2.OperationalError, RedisConnectionError) as e:
5050            if time.time() - start_time > timeout:
5051                logger.error("Timeout while waiting for services to be available.")
5052                raise e
5053            logger.warning("Waiting for services to be available...")
5054            time.sleep(1)
5055
5056
5057def get_extra_info(extra: dict) -> dict:
5058    try:
5059        task = asyncio.current_task()
5060        coro_name = task.get_coro().__name__ if task else "NoTask"
5061        task_id = id(task) if task else "NoTaskID"
5062    except Exception:
5063        coro_name = "NoTask"
5064        task_id = "NoTaskID"
5065    extra_info = {
5066        "coro_name": coro_name,
5067        "task_id": task_id,
5068        **extra,
5069    }
5070    return extra_info
5071
5072
5073def configure_logs_of_other_modules():
5074    validator_hotkey = settings.get_bittensor_wallet().get_hotkey().ss58_address
5075
5076    logging.basicConfig(
5077        level=logging.INFO,
5078        format=f"Validator: {validator_hotkey} | Name: %(name)s | Time: %(asctime)s | Level: %(levelname)s | File: %(filename)s | Function: %(funcName)s | Line: %(lineno)s | Process: %(process)d | Message: %(message)s",
5079    )
5080
5081    sqlalchemy_logger = logging.getLogger("sqlalchemy")
5082    sqlalchemy_logger.setLevel(logging.WARNING)
5083
5084    class ContextFilter(logging.Filter):
5085        """
5086        This is a filter which injects contextual information into the log.
5087        """
5088
5089        def filter(self, record):
5090            record.context = context.get() or "Default"
5091            return True
5092
5093    # Create a custom formatter that adds the context to the log messages
5094    class CustomFormatter(logging.Formatter):
5095        def format(self, record):
5096            try:
5097                task = asyncio.current_task()
5098                coro_name = task.get_coro().__name__ if task else "NoTask"
5099                task_id = id(task) if task else "NoTaskID"
5100                return f"{getattr(record, 'context', 'Default')} | {coro_name} | {task_id} | {super().format(record)}"
5101            except Exception:
5102                return ""
5103
5104    asyncssh_logger = logging.getLogger("asyncssh")
5105    asyncssh_logger.setLevel(logging.WARNING)
5106
5107    # Add the filter to the logger
5108    asyncssh_logger.addFilter(ContextFilter())
5109
5110    # Create a handler for the logger
5111    handler = logging.StreamHandler()
5112
5113    # Add the handler to the logger
5114    asyncssh_logger.handlers = []
5115    asyncssh_logger.addHandler(handler)
5116
5117    # Set the formatter for the handler
5118    handler.setFormatter(
5119        CustomFormatter("%(name)s %(asctime)s %(levelname)s %(filename)s %(process)d %(message)s")
5120    )
5121
5122
5123def get_logger(name: str):
5124    LOGGING = {
5125        "version": 1,
5126        "disable_existing_loggers": False,
5127        "formatters": {
5128            "verbose": {
5129                "format": "%(levelname)-8s %(asctime)s --- "
5130                "%(lineno)-8s [%(name)s] %(funcName)-24s : %(message)s",
5131            }
5132        },
5133        "handlers": {
5134            "console": {
5135                "class": "logging.StreamHandler",
5136                "formatter": "verbose",
5137            },
5138        },
5139        "root": {
5140            "level": "INFO",
5141            "handlers": ["console"],
5142        },
5143        "loggers": {
5144            "connector": {
5145                "level": "INFO",
5146                "handlers": ["console"],
5147                "propagate": False,
5148            },
5149            "asyncssh": {
5150                "level": "WARNING",
5151                "propagate": True,
5152            },
5153        },
5154    }
5155
5156    dictConfig(LOGGING)
5157    logger = logging.getLogger(name)
5158    return logger
5159
5160
5161class StructuredMessage:
5162    def __init__(self, message, extra: dict):
5163        self.message = message
5164        self.extra = extra
5165
5166    def __str__(self):
5167        return "%s >>> %s" % (self.message, json.dumps(self.extra))  # noqa
5168
5169
5170_m = StructuredMessage
5171
5172
5173
5174---
5175File: /neurons/validators/src/core/validator.py
5176---
5177
5178import asyncio
5179import json
5180from datetime import datetime
5181from typing import TYPE_CHECKING
5182
5183import bittensor
5184import numpy as np
5185from bittensor.utils.weight_utils import (
5186    convert_weights_and_uids_for_emit,
5187    process_weights_for_netuid,
5188)
5189from payload_models.payloads import MinerJobRequestPayload
5190from websockets.protocol import State as WebSocketClientState
5191
5192from core.config import settings
5193from core.utils import _m, get_extra_info, get_logger
5194from services.docker_service import REPOSITORIES, DockerService
5195from services.file_encrypt_service import FileEncryptService
5196from services.miner_service import MinerService
5197from services.redis_service import EXECUTOR_COUNT_PREFIX, RedisService
5198from services.ssh_service import SSHService
5199from services.task_service import TaskService
5200
5201if TYPE_CHECKING:
5202    from bittensor_wallet import Wallet
5203
5204logger = get_logger(__name__)
5205
5206SYNC_CYCLE = 12
5207WEIGHT_MAX_COUNTER = 6
5208MINER_SCORES_KEY = "miner_scores"
5209
5210
5211class Validator:
5212    wallet: "Wallet"
5213    netuid: int
5214    subtensor: bittensor.Subtensor
5215
5216    def __init__(self, debug_miner=None):
5217        self.config = settings.get_bittensor_config()
5218
5219        self.wallet = settings.get_bittensor_wallet()
5220        self.netuid = settings.BITTENSOR_NETUID
5221
5222        self.should_exit = False
5223        self.is_running = False
5224        self.last_job_run_blocks = 0
5225        self.default_extra = {}
5226
5227        self.subtensor = None
5228        self.set_subtensor()
5229
5230        loop = asyncio.get_event_loop()
5231        loop.run_until_complete(self.initiate_services())
5232
5233        self.debug_miner = debug_miner
5234
5235    async def initiate_services(self):
5236        ssh_service = SSHService()
5237        self.redis_service = RedisService()
5238        task_service = TaskService(
5239            ssh_service=ssh_service,
5240            redis_service=self.redis_service,
5241        )
5242        self.docker_service = DockerService(
5243            ssh_service=ssh_service,
5244            redis_service=self.redis_service,
5245        )
5246        self.miner_service = MinerService(
5247            ssh_service=ssh_service,
5248            task_service=task_service,
5249            redis_service=self.redis_service,
5250        )
5251        self.file_encrypt_service = FileEncryptService(ssh_service=ssh_service)
5252
5253        # init miner_scores
5254        try:
5255            if await self.should_set_weights():
5256                self.miner_scores = {}
5257
5258                # clear executor_counts
5259                try:
5260                    await self.redis_service.clear_all_executor_counts()
5261                    logger.info(
5262                        _m(
5263                            "[initiate_services] Cleared executor_counts",
5264                            extra=get_extra_info(self.default_extra),
5265                        ),
5266                    )
5267                except Exception as e:
5268                    logger.error(
5269                        _m(
5270                            "[initiate_services] Failed to clear executor_counts",
5271                            extra=get_extra_info({**self.default_extra, "error": str(e)}),
5272                        ),
5273                    )
5274            else:
5275                miner_scores_json = await self.redis_service.get(MINER_SCORES_KEY)
5276                if miner_scores_json is None:
5277                    logger.info(
5278                        _m(
5279                            "[initiate_services] No data found in Redis for MINER_SCORES_KEY, initializing empty miner_scores.",
5280                            extra=get_extra_info(self.default_extra),
5281                        ),
5282                    )
5283                    self.miner_scores = {}
5284                else:
5285                    self.miner_scores = json.loads(miner_scores_json)
5286
5287            # await self.redis_service.clear_all_ssh_ports()
5288        except Exception as e:
5289            logger.error(
5290                _m(
5291                    "[initiate_services] Failed to initialize miner_scores",
5292                    extra=get_extra_info({**self.default_extra, "error": str(e)}),
5293                ),
5294            )
5295            self.miner_scores = {}
5296
5297        logger.info(
5298            _m(
5299                "[initiate_services] miner scores",
5300                extra=get_extra_info(
5301                    {
5302                        **self.default_extra,
5303                        **self.miner_scores,
5304                    }
5305                ),
5306            ),
5307        )
5308
5309    def set_subtensor(self):
5310        try:
5311            if (
5312                self.subtensor
5313                and self.subtensor.substrate
5314                and self.subtensor.substrate.websocket
5315                and self.subtensor.substrate.websocket.state is WebSocketClientState.OPEN
5316            ):
5317                return
5318
5319            logger.info(
5320                _m(
5321                    "Getting subtensor",
5322                    extra=get_extra_info(self.default_extra),
5323                ),
5324            )
5325            subtensor = bittensor.subtensor(config=self.config)
5326
5327            # check registered
5328            self.check_registered(subtensor)
5329
5330            self.subtensor = subtensor
5331        except Exception as e:
5332            logger.info(
5333                _m(
5334                    "[Error] Getting subtensor",
5335                    extra=get_extra_info(
5336                        {
5337                            **self.default_extra,
5338                            "error": str(e),
5339                        }
5340                    ),
5341                ),
5342            )
5343
5344    def check_registered(self, subtensor: bittensor.subtensor):
5345        try:
5346            if not subtensor.is_hotkey_registered(
5347                netuid=self.netuid,
5348                hotkey_ss58=self.wallet.get_hotkey().ss58_address,
5349            ):
5350                logger.error(
5351                    _m(
5352                        f"[check_registered] Wallet: {self.wallet} is not registered on netuid {self.netuid}.",
5353                        extra=get_extra_info(self.default_extra),
5354                    ),
5355                )
5356                exit()
5357            logger.info(
5358                _m(
5359                    "[check_registered] Validator is registered",
5360                    extra=get_extra_info(self.default_extra),
5361                ),
5362            )
5363        except Exception as e:
5364            logger.error(
5365                _m(
5366                    "[check_registered] Checking validator registered failed",
5367                    extra=get_extra_info({**self.default_extra, "error": str(e)}),
5368                ),
5369            )
5370
5371    def get_metagraph(self):
5372        return self.subtensor.metagraph(netuid=self.netuid)
5373
5374    def get_node(self):
5375        # return SubstrateInterface(url=self.config.subtensor.chain_endpoint)
5376        return self.subtensor.substrate
5377
5378    def get_current_block(self):
5379        node = self.get_node()
5380        return node.query("System", "Number", []).value
5381
5382    def get_weights_rate_limit(self):
5383        node = self.get_node()
5384        return node.query("SubtensorModule", "WeightsSetRateLimit", [self.netuid]).value
5385
5386    def get_my_uid(self):
5387        metagraph = self.get_metagraph()
5388        return metagraph.hotkeys.index(self.wallet.hotkey.ss58_address)
5389
5390    def get_tempo(self):
5391        return self.subtensor.tempo(self.netuid)
5392
5393    def fetch_miners(self):
5394        logger.info(
5395            _m(
5396                "[fetch_miners] Fetching miners",
5397                extra=get_extra_info(self.default_extra),
5398            ),
5399        )
5400
5401        if self.debug_miner:
5402            miners = [self.debug_miner]
5403        else:
5404            metagraph = self.get_metagraph()
5405            miners = [
5406                neuron
5407                for neuron in metagraph.neurons
5408                if neuron.axon_info.is_serving
5409                and (
5410                    not settings.DEBUG
5411                    or not settings.DEBUG_MINER_HOTKEY
5412                    or settings.DEBUG_MINER_HOTKEY == neuron.axon_info.hotkey
5413                )
5414            ]
5415        logger.info(
5416            _m(
5417                f"[fetch_miners] Found {len(miners)} miners",
5418                extra=get_extra_info(self.default_extra),
5419            ),
5420        )
5421        return miners
5422
5423    async def set_weights(self, miners):
5424        logger.info(
5425            _m(
5426                "[set_weights] scores",
5427                extra=get_extra_info(
5428                    {
5429                        **self.default_extra,
5430                        **self.miner_scores,
5431                    }
5432                ),
5433            ),
5434        )
5435
5436        if not self.miner_scores:
5437            logger.info(
5438                _m(
5439                    "[set_weights] No miner scores available, skipping set_weights.",
5440                    extra=get_extra_info(self.default_extra),
5441                ),
5442            )
5443            return
5444
5445        uids = np.zeros(len(miners), dtype=np.int64)
5446        weights = np.zeros(len(miners), dtype=np.float32)
5447        for ind, miner in enumerate(miners):
5448            uids[ind] = miner.uid
5449            weights[ind] = self.miner_scores.get(miner.hotkey, 0.0)
5450
5451        logger.info(
5452            _m(
5453                f"[set_weights] uids: {uids} weights: {weights}",
5454                extra=get_extra_info(self.default_extra),
5455            ),
5456        )
5457
5458        metagraph = self.get_metagraph()
5459        processed_uids, processed_weights = process_weights_for_netuid(
5460            uids=uids,
5461            weights=weights,
5462            netuid=self.netuid,
5463            subtensor=self.subtensor,
5464            metagraph=metagraph,
5465        )
5466
5467        logger.info(
5468            _m(
5469                f"[set_weights] processed_uids: {processed_uids} processed_weights: {processed_weights}",
5470                extra=get_extra_info(self.default_extra),
5471            ),
5472        )
5473
5474        uint_uids, uint_weights = convert_weights_and_uids_for_emit(
5475            uids=processed_uids, weights=processed_weights
5476        )
5477
5478        logger.info(
5479            _m(
5480                f"[set_weights] uint_uids: {uint_uids} uint_weights: {uint_weights}",
5481                extra=get_extra_info(self.default_extra),
5482            ),
5483        )
5484
5485        result, msg = self.subtensor.set_weights(
5486            wallet=self.wallet,
5487            netuid=self.netuid,
5488            uids=uint_uids,
5489            weights=uint_weights,
5490            wait_for_finalization=False,
5491            wait_for_inclusion=False,
5492        )
5493        if result is True:
5494            logger.info(
5495                _m(
5496                    "[set_weights] set weights successfully",
5497                    extra=get_extra_info(self.default_extra),
5498                ),
5499            )
5500        else:
5501            logger.error(
5502                _m(
5503                    "[set_weights] set weights failed",
5504                    extra=get_extra_info(
5505                        {
5506                            **self.default_extra,
5507                            "msg": msg,
5508                        }
5509                    ),
5510                ),
5511            )
5512
5513        self.miner_scores = {}
5514
5515        # clear executor_counts
5516        try:
5517            await self.redis_service.clear_all_executor_counts()
5518            logger.info(
5519                _m(
5520                    "[set_weights] Cleared executor_counts",
5521                    extra=get_extra_info(self.default_extra),
5522                ),
5523            )
5524        except Exception as e:
5525            logger.error(
5526                _m(
5527                    "[set_weights] Failed to clear executor_counts",
5528                    extra=get_extra_info(
5529                        {
5530                            **self.default_extra,
5531                            "error": str(e),
5532                        }
5533                    ),
5534                ),
5535            )
5536
5537    def get_last_update(self, block):
5538        try:
5539            node = self.get_node()
5540            last_update_blocks = (
5541                block
5542                - node.query("SubtensorModule", "LastUpdate", [self.netuid]).value[
5543                    self.get_my_uid()
5544                ]
5545            )
5546        except Exception as e:
5547            logger.error(
5548                _m(
5549                    "[get_last_update] Error getting last update",
5550                    extra=get_extra_info(
5551                        {
5552                            **self.default_extra,
5553                            "error": str(e),
5554                        }
5555                    ),
5556                ),
5557            )
5558            # means that the validator is not registered yet. The validator should break if this is the case anyways
5559            last_update_blocks = 1000
5560
5561        logger.info(
5562            _m(
5563                f"[get_last_update] last set weights successfully {last_update_blocks} blocks ago",
5564                extra=get_extra_info(self.default_extra),
5565            ),
5566        )
5567        return last_update_blocks
5568
5569    async def should_set_weights(self) -> bool:
5570        """Check if current block is for setting weights."""
5571        try:
5572            current_block = self.get_current_block()
5573            last_update = self.get_last_update(current_block)
5574            tempo = self.get_tempo()
5575            weights_rate_limit = self.get_weights_rate_limit()
5576
5577            blocks_till_epoch = tempo - (current_block + self.netuid + 1) % (tempo + 1)
5578
5579            should_set_weights = last_update >= tempo
5580
5581            logger.info(
5582                _m(
5583                    "[should_set_weights] Checking should set weights",
5584                    extra=get_extra_info(
5585                        {
5586                            **self.default_extra,
5587                            "weights_rate_limit": weights_rate_limit,
5588                            "tempo": tempo,
5589                            "current_block": current_block,
5590                            "last_update": last_update,
5591                            "blocks_till_epoch": blocks_till_epoch,
5592                            "should_set_weights": should_set_weights,
5593                        }
5594                    ),
5595                ),
5596            )
5597            return should_set_weights
5598        except Exception as e:
5599            logger.error(
5600                _m(
5601                    "[should_set_weights] Checking set weights failed",
5602                    extra=get_extra_info(
5603                        {
5604                            **self.default_extra,
5605                            "error": str(e),
5606                        }
5607                    ),
5608                ),
5609            )
5610            return False
5611
5612    async def get_time_from_block(self, block: int):
5613        max_retries = 3
5614        retries = 0
5615        while retries < max_retries:
5616            try:
5617                node = self.get_node()
5618                block_hash = node.get_block_hash(block)
5619                return datetime.fromtimestamp(
5620                    node.query("Timestamp", "Now", block_hash=block_hash).value / 1000
5621                ).strftime("%Y-%m-%d %H:%M:%S")
5622            except Exception as e:
5623                logger.error(
5624                    _m(
5625                        "[get_time_from_block] Error getting time from block",
5626                        extra=get_extra_info(
5627                            {
5628                                **self.default_extra,
5629                                "retries": retries,
5630                                "error": str(e),
5631                            }
5632                        ),
5633                    ),
5634                )
5635                retries += 1
5636        return "Unknown"
5637
5638    async def sync(self):
5639        try:
5640            self.set_subtensor()
5641
5642            logger.info(
5643                _m(
5644                    "[sync] Syncing at subtensor",
5645                    extra=get_extra_info(self.default_extra),
5646                ),
5647            )
5648
5649            # fetch miners
5650            miners = self.fetch_miners()
5651
5652            if await self.should_set_weights():
5653                await self.set_weights(miners=miners)
5654
5655            current_block = self.get_current_block()
5656            logger.info(
5657                _m(
5658                    "[sync] Current block",
5659                    extra=get_extra_info(
5660                        {
5661                            **self.default_extra,
5662                            "current_block": current_block,
5663                        }
5664                    ),
5665                ),
5666            )
5667
5668            if current_block - self.last_job_run_blocks >= settings.BLOCKS_FOR_JOB:
5669                job_block = (current_block // settings.BLOCKS_FOR_JOB) * settings.BLOCKS_FOR_JOB
5670                job_batch_id = await self.get_time_from_block(job_block)
5671
5672                logger.info(
5673                    _m(
5674                        "[sync] Send jobs to miners",
5675                        extra=get_extra_info(
5676                            {
5677                                **self.default_extra,
5678                                "miners": len(miners),
5679                                "current_block": current_block,
5680                                "job_batch_id": job_batch_id,
5681                            }
5682                        ),
5683                    ),
5684                )
5685
5686                self.last_job_run_blocks = current_block
5687
5688                docker_hub_digests = await self.docker_service.get_docker_hub_digests(REPOSITORIES)
5689                logger.info(
5690                    _m(
5691                        "Docker Hub Digests",
5692                        extra=get_extra_info(
5693                            {"job_batch_id": job_batch_id, "docker_hub_digests": docker_hub_digests}
5694                        ),
5695                    ),
5696                )
5697
5698                encypted_files = self.file_encrypt_service.ecrypt_miner_job_files()
5699
5700                task_info = {}
5701
5702                # request jobs
5703                jobs = [
5704                    asyncio.create_task(
5705                        self.miner_service.request_job_to_miner(
5706                            payload=MinerJobRequestPayload(
5707                                job_batch_id=job_batch_id,
5708                                miner_hotkey=miner.hotkey,
5709                                miner_address=miner.axon_info.ip,
5710                                miner_port=miner.axon_info.port,
5711                            ),
5712                            encypted_files=encypted_files,
5713                            docker_hub_digests=docker_hub_digests,
5714                            debug=settings.DEBUG,
5715                        )
5716                    )
5717                    for miner in miners
5718                ]
5719
5720                for miner, job in zip(miners, jobs):
5721                    task_info[job] = {
5722                        "miner_hotkey": miner.hotkey,
5723                        "miner_address": miner.axon_info.ip,
5724                        "miner_port": miner.axon_info.port,
5725                        "job_batch_id": job_batch_id,
5726                    }
5727
5728                try:
5729                    # Run all jobs with asyncio.wait and set a timeout
5730                    done, pending = await asyncio.wait(jobs, timeout=60 * 10 - 100)
5731
5732                    # Process completed jobs
5733                    for task in done:
5734                        try:
5735                            result = task.result()
5736                            if result:
5737                                logger.info(
5738                                    _m(
5739                                        "[sync] Job_Result",
5740                                        extra=get_extra_info(
5741                                            {
5742                                                **self.default_extra,
5743                                                "result": result,
5744                                            }
5745                                        ),
5746                                    ),
5747                                )
5748                                miner_hotkey = result.get("miner_hotkey")
5749                                job_score = result.get("score")
5750
5751                                key = f"{EXECUTOR_COUNT_PREFIX}:{miner_hotkey}"
5752
5753                                try:
5754                                    executor_counts = await self.redis_service.hgetall(key)
5755                                    parsed_counts = [
5756                                        {
5757                                            "job_batch_id": job_id.decode("utf-8"),
5758                                            **json.loads(data.decode("utf-8")),
5759                                        }
5760                                        for job_id, data in executor_counts.items()
5761                                    ]
5762
5763                                    if parsed_counts:
5764                                        logger.info(
5765                                            _m(
5766                                                "[sync] executor counts list",
5767                                                extra=get_extra_info(
5768                                                    {
5769                                                        **self.default_extra,
5770                                                        "miner_hotkey": miner_hotkey,
5771                                                        "parsed_counts": parsed_counts,
5772                                                    }
5773                                                ),
5774                                            ),
5775                                        )
5776
5777                                        max_executors = max(
5778                                            parsed_counts, key=lambda x: x["total"]
5779                                        )["total"]
5780                                        min_executors = min(
5781                                            parsed_counts, key=lambda x: x["total"]
5782                                        )["total"]
5783
5784                                        logger.info(
5785                                            _m(
5786                                                "[sync] executor counts",
5787                                                extra=get_extra_info(
5788                                                    {
5789                                                        **self.default_extra,
5790                                                        "miner_hotkey": miner_hotkey,
5791                                                        "job_batch_id": job_batch_id,
5792                                                        "max_executors": max_executors,
5793                                                        "min_executors": min_executors,
5794                                                    }
5795                                                ),
5796                                            ),
5797                                        )
5798
5799                                except Exception as e:
5800                                    logger.error(
5801                                        _m(
5802                                            "[sync] Get executor counts error",
5803                                            extra=get_extra_info(
5804                                                {
5805                                                    **self.default_extra,
5806                                                    "miner_hotkey": miner_hotkey,
5807                                                    "job_batch_id": job_batch_id,
5808                                                    "error": str(e),
5809                                                }
5810                                            ),
5811                                        ),
5812                                    )
5813
5814                                if miner_hotkey in self.miner_scores:
5815                                    self.miner_scores[miner_hotkey] += job_score
5816                                else:
5817                                    self.miner_scores[miner_hotkey] = job_score
5818                            else:
5819                                info = task_info.get(task, {})
5820                                miner_hotkey = info.get("miner_hotkey", "unknown")
5821                                job_batch_id = info.get("job_batch_id", "unknown")
5822                                logger.error(
5823                                    _m(
5824                                        "[sync] No_Job_Result",
5825                                        extra=get_extra_info(
5826                                            {
5827                                                **self.default_extra,
5828                                                "miner_hotkey": miner_hotkey,
5829                                                "job_batch_id": job_batch_id,
5830                                            }
5831                                        ),
5832                                    ),
5833                                )
5834
5835                        except Exception as e:
5836                            logger.error(
5837                                _m(
5838                                    "[sync] Error processing job result",
5839                                    extra=get_extra_info(
5840                                        {
5841                                            **self.default_extra,
5842                                            "job_batch_id": job_batch_id,
5843                                            "error": str(e),
5844                                        }
5845                                    ),
5846                                ),
5847                            )
5848
5849                    # Handle pending jobs (those that did not complete within the timeout)
5850                    if pending:
5851                        for task in pending:
5852                            info = task_info.get(task, {})
5853                            miner_hotkey = info.get("miner_hotkey", "unknown")
5854                            job_batch_id = info.get("job_batch_id", "unknown")
5855
5856                            logger.error(
5857                                _m(
5858                                    "[sync] Job_Timeout",
5859                                    extra=get_extra_info(
5860                                        {
5861                                            **self.default_extra,
5862                                            "miner_hotkey": miner_hotkey,
5863                                            "job_batch_id": job_batch_id,
5864                                        }
5865                                    ),
5866                                ),
5867                            )
5868                            task.cancel()
5869
5870                    logger.info(
5871                        _m(
5872                            "[sync] All Jobs finished",
5873                            extra=get_extra_info(
5874                                {
5875                                    **self.default_extra,
5876                                    "job_batch_id": job_batch_id,
5877                                    "miner_scores": self.miner_scores,
5878                                }
5879                            ),
5880                        ),
5881                    )
5882
5883                except Exception as e:
5884                    logger.error(
5885                        _m(
5886                            "[sync] Unexpected error",
5887                            extra=get_extra_info(
5888                                {
5889                                    **self.default_extra,
5890                                    "job_batch_id": job_batch_id,
5891                                    "error": str(e),
5892                                }
5893                            ),
5894                        ),
5895                    )
5896            else:
5897                remaining_blocks = (
5898                    current_block // settings.BLOCKS_FOR_JOB + 1
5899                ) * settings.BLOCKS_FOR_JOB - current_block
5900
5901                logger.info(
5902                    _m(
5903                        "[sync] Remaining blocks for next job",
5904                        extra=get_extra_info(
5905                            {
5906                                **self.default_extra,
5907                                "remaining_blocks": remaining_blocks,
5908                                "last_job_run_blocks": self.last_job_run_blocks,
5909                                "current_block": current_block,
5910                            }
5911                        ),
5912                    ),
5913                )
5914        except Exception as e:
5915            logger.error(
5916                _m(
5917                    "[sync] Unknown error",
5918                    extra=get_extra_info(
5919                        {
5920                            **self.default_extra,
5921                            "error": str(e),
5922                        }
5923                    ),
5924                ),
5925            )
5926
5927    async def start(self):
5928        logger.info(
5929            _m(
5930                "[start] Starting Validator in background",
5931                extra=get_extra_info(self.default_extra),
5932            ),
5933        )
5934        try:
5935            while not self.should_exit:
5936                await self.sync()
5937
5938                # sync every 12 seconds
5939                await asyncio.sleep(SYNC_CYCLE)
5940
5941        except KeyboardInterrupt:
5942            logger.info(
5943                _m(
5944                    "[start] Validator killed by keyboard interrupt",
5945                    extra=get_extra_info(self.default_extra),
5946                ),
5947            )
5948            exit()
5949        except Exception as e:
5950            logger.info(
5951                _m(
5952                    "[start] Unknown error",
5953                    extra=get_extra_info({**self.default_extra, "error": str(e)}),
5954                ),
5955            )
5956
5957    async def stop(self):
5958        logger.info(
5959            _m(
5960                "[stop] Stopping Validator process",
5961                extra=get_extra_info(self.default_extra),
5962            ),
5963        )
5964
5965        try:
5966            await self.redis_service.set(MINER_SCORES_KEY, json.dumps(self.miner_scores))
5967        except Exception as e:
5968            logger.info(
5969                _m(
5970                    "[stop] Failed to save miner_scores",
5971                    extra=get_extra_info({**self.default_extra, "error": str(e)}),
5972                ),
5973            )
5974
5975        self.should_exit = True
5976
5977
5978
5979---
5980File: /neurons/validators/src/daos/__init__.py
5981---
5982
5983
5984
5985
5986---
5987File: /neurons/validators/src/daos/base.py
5988---
5989
5990from core.db import SessionDep
5991
5992
5993class BaseDao:
5994    def __init__(self, session: SessionDep):
5995        self.session = session
5996
5997
5998
5999---
6000File: /neurons/validators/src/daos/executor.py
6001---
6002
6003import logging
6004
6005from sqlalchemy import select
6006
6007from daos.base import BaseDao
6008from models.executor import Executor
6009
6010logger = logging.getLogger(__name__)
6011
6012
6013class ExecutorDao(BaseDao):
6014    async def upsert(self, executor: Executor) -> Executor:
6015        try:
6016            existing_executor = await self.get_executor(
6017                executor_id=executor.executor_id, miner_hotkey=executor.miner_hotkey
6018            )
6019
6020            if existing_executor:
6021                # Update the fields of the existing executor
6022                existing_executor.miner_address = executor.miner_address
6023                existing_executor.miner_port = executor.miner_port
6024                existing_executor.executor_ip_address = executor.executor_ip_address
6025                existing_executor.executor_ssh_username = executor.executor_ssh_username
6026                existing_executor.executor_ssh_port = executor.executor_ssh_port
6027
6028                await self.session.commit()
6029                await self.session.refresh(existing_executor)
6030                return existing_executor
6031            else:
6032                # Insert the new executor
6033                self.session.add(executor)
6034                await self.session.commit()
6035                await self.session.refresh(executor)
6036
6037                return executor
6038        except Exception as e:
6039            await self.session.rollback()
6040            logger.error("Error upsert executor: %s", e)
6041            raise
6042
6043    async def rent(self, executor_id: str, miner_hotkey: str) -> Executor:
6044        try:
6045            executor = await self.get_executor(executor_id=executor_id, miner_hotkey=miner_hotkey)
6046            if executor:
6047                executor.rented = True
6048                await self.session.commit()
6049                await self.session.refresh(executor)
6050
6051            return executor
6052        except Exception as e:
6053            await self.session.rollback()
6054            logger.error("Error rent executor: %s", e)
6055            raise
6056
6057    async def unrent(self, executor_id: str, miner_hotkey: str) -> Executor:
6058        try:
6059            executor = await self.get_executor(executor_id=executor_id, miner_hotkey=miner_hotkey)
6060            if executor:
6061                executor.rented = False
6062                await self.session.commit()
6063                await self.session.refresh(executor)
6064
6065            return executor
6066        except Exception as e:
6067            await self.session.rollback()
6068            logger.error("Error unrent executor: %s", e)
6069            raise
6070
6071    async def get_executor(self, executor_id: str, miner_hotkey: str) -> Executor:
6072        try:
6073            statement = select(Executor).where(
6074                Executor.miner_hotkey == miner_hotkey, Executor.executor_id == executor_id
6075            )
6076            result = await self.session.exec(statement)
6077            return result.scalar_one_or_none()
6078        except Exception as e:
6079            await self.session.rollback()
6080            logger.error("Error get executor: %s", e)
6081            raise
6082
6083
6084
6085---
6086File: /neurons/validators/src/daos/task.py
6087---
6088
6089from datetime import datetime, timedelta
6090
6091import sqlalchemy
6092from pydantic import BaseModel
6093from sqlalchemy import func, select
6094
6095from daos.base import BaseDao
6096from models.task import Task, TaskStatus
6097
6098
6099class MinerScore(BaseModel):
6100    miner_hotkey: str
6101    total_score: float
6102
6103
6104class TaskDao(BaseDao):
6105    async def save(self, task: Task) -> Task:
6106        try:
6107            self.session.add(task)
6108            await self.session.commit()
6109            await self.session.refresh(task)
6110            return task
6111        except Exception as e:
6112            await self.session.rollback()
6113            raise e
6114
6115    async def update(self, uuid: str, **kwargs) -> Task:
6116        task = await self.get_task_by_uuid(uuid)
6117        if not task:
6118            return None  # Or raise an exception if task is not found
6119
6120        for key, value in kwargs.items():
6121            if hasattr(task, key):
6122                setattr(task, key, value)
6123
6124        try:
6125            await self.session.commit()
6126            await self.session.refresh(task)
6127            return task
6128        except Exception as e:
6129            await self.session.rollback()
6130            raise e
6131
6132    async def get_scores_for_last_epoch(self, tempo: int) -> list[MinerScore]:
6133        last_epoch = datetime.utcnow() - timedelta(seconds=tempo * 12)
6134
6135        statement = (
6136            select(Task.miner_hotkey, func.sum(Task.score).label("total_score"))
6137            .where(
6138                Task.task_status.in_([TaskStatus.Finished, TaskStatus.Failed]),
6139                Task.created_at >= last_epoch,
6140            )
6141            .group_by(Task.miner_hotkey)
6142        )
6143        results: sqlalchemy.engine.result.ChunkedIteratorResult = await self.session.exec(statement)
6144        results = results.all()
6145
6146        return [
6147            MinerScore(
6148                miner_hotkey=result[0],
6149                total_score=result[1],
6150            )
6151            for result in results
6152        ]
6153
6154    async def get_task_by_uuid(self, uuid: str) -> Task:
6155        statement = select(Task).where(Task.uuid == uuid)
6156        results = await self.session.exec(statement)
6157        return results.scalar_one_or_none()
6158
6159
6160
6161---
6162File: /neurons/validators/src/miner_jobs/machine_scrape.py
6163---
6164
6165from ctypes import *
6166import sys
6167import os
6168import json
6169import re
6170import shutil
6171import subprocess
6172import threading
6173import psutil
6174from functools import wraps
6175import hashlib
6176from base64 import b64encode
6177from cryptography.fernet import Fernet
6178import tempfile
6179
6180
6181nvmlLib = None
6182libLoadLock = threading.Lock()
6183_nvmlLib_refcount = 0
6184
6185_nvmlReturn_t = c_uint
6186NVML_SUCCESS = 0
6187NVML_ERROR_UNINITIALIZED = 1
6188NVML_ERROR_INVALID_ARGUMENT = 2
6189NVML_ERROR_NOT_SUPPORTED = 3
6190NVML_ERROR_NO_PERMISSION = 4
6191NVML_ERROR_ALREADY_INITIALIZED = 5
6192NVML_ERROR_NOT_FOUND = 6
6193NVML_ERROR_INSUFFICIENT_SIZE = 7
6194NVML_ERROR_INSUFFICIENT_POWER = 8
6195NVML_ERROR_DRIVER_NOT_LOADED = 9
6196NVML_ERROR_TIMEOUT = 10
6197NVML_ERROR_IRQ_ISSUE = 11
6198NVML_ERROR_LIBRARY_NOT_FOUND = 12
6199NVML_ERROR_FUNCTION_NOT_FOUND = 13
6200NVML_ERROR_CORRUPTED_INFOROM = 14
6201NVML_ERROR_GPU_IS_LOST = 15
6202NVML_ERROR_RESET_REQUIRED = 16
6203NVML_ERROR_OPERATING_SYSTEM = 17
6204NVML_ERROR_LIB_RM_VERSION_MISMATCH = 18
6205NVML_ERROR_IN_USE = 19
6206NVML_ERROR_MEMORY = 20
6207NVML_ERROR_NO_DATA = 21
6208NVML_ERROR_VGPU_ECC_NOT_SUPPORTED = 22
6209NVML_ERROR_INSUFFICIENT_RESOURCES = 23
6210NVML_ERROR_FREQ_NOT_SUPPORTED = 24
6211NVML_ERROR_ARGUMENT_VERSION_MISMATCH = 25
6212NVML_ERROR_DEPRECATED = 26
6213NVML_ERROR_NOT_READY = 27
6214NVML_ERROR_GPU_NOT_FOUND = 28
6215NVML_ERROR_INVALID_STATE = 29
6216NVML_ERROR_UNKNOWN = 999
6217
6218# buffer size
6219NVML_DEVICE_INFOROM_VERSION_BUFFER_SIZE = 16
6220NVML_DEVICE_UUID_BUFFER_SIZE = 80
6221NVML_DEVICE_UUID_V2_BUFFER_SIZE = 96
6222NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE = 80
6223NVML_SYSTEM_NVML_VERSION_BUFFER_SIZE = 80
6224NVML_DEVICE_NAME_BUFFER_SIZE = 64
6225NVML_DEVICE_NAME_V2_BUFFER_SIZE = 96
6226NVML_DEVICE_SERIAL_BUFFER_SIZE = 30
6227NVML_DEVICE_PART_NUMBER_BUFFER_SIZE = 80
6228NVML_DEVICE_GPU_PART_NUMBER_BUFFER_SIZE = 80
6229NVML_DEVICE_VBIOS_VERSION_BUFFER_SIZE = 32
6230NVML_DEVICE_PCI_BUS_ID_BUFFER_SIZE = 32
6231NVML_DEVICE_PCI_BUS_ID_BUFFER_V2_SIZE = 16
6232NVML_GRID_LICENSE_BUFFER_SIZE = 128
6233NVML_VGPU_NAME_BUFFER_SIZE = 64
6234NVML_GRID_LICENSE_FEATURE_MAX_COUNT = 3
6235NVML_VGPU_METADATA_OPAQUE_DATA_SIZE = sizeof(c_uint) + 256
6236NVML_VGPU_PGPU_METADATA_OPAQUE_DATA_SIZE = 256
6237NVML_DEVICE_GPU_FRU_PART_NUMBER_BUFFER_SIZE = 0x14
6238
6239_nvmlClockType_t = c_uint
6240NVML_CLOCK_GRAPHICS = 0
6241NVML_CLOCK_SM = 1
6242NVML_CLOCK_MEM = 2
6243NVML_CLOCK_VIDEO = 3
6244NVML_CLOCK_COUNT = 4
6245
6246NVML_VALUE_NOT_AVAILABLE_ulonglong = c_ulonglong(-1)
6247
6248
6249class struct_c_nvmlDevice_t(Structure):
6250    pass  # opaque handle
6251
6252
6253c_nvmlDevice_t = POINTER(struct_c_nvmlDevice_t)
6254
6255
6256class _PrintableStructure(Structure):
6257    """
6258    Abstract class that produces nicer __str__ output than ctypes.Structure.
6259    e.g. instead of:
6260      >>> print str(obj)
6261      <class_name object at 0x7fdf82fef9e0>
6262    this class will print
6263      class_name(field_name: formatted_value, field_name: formatted_value)
6264
6265    _fmt_ dictionary of <str _field_ name> -> <str format>
6266    e.g. class that has _field_ 'hex_value', c_uint could be formatted with
6267      _fmt_ = {"hex_value" : "%08X"}
6268    to produce nicer output.
6269    Default fomratting string for all fields can be set with key "<default>" like:
6270      _fmt_ = {"<default>" : "%d MHz"} # e.g all values are numbers in MHz.
6271    If not set it's assumed to be just "%s"
6272
6273    Exact format of returned str from this class is subject to change in the future.
6274    """
6275    _fmt_ = {}
6276
6277    def __str__(self):
6278        result = []
6279        for x in self._fields_:
6280            key = x[0]
6281            value = getattr(self, key)
6282            fmt = "%s"
6283            if key in self._fmt_:
6284                fmt = self._fmt_[key]
6285            elif "<default>" in self._fmt_:
6286                fmt = self._fmt_["<default>"]
6287            result.append(("%s: " + fmt) % (key, value))
6288        return self.__class__.__name__ + "(" + ", ".join(result) + ")"
6289
6290    def __getattribute__(self, name):
6291        res = super(_PrintableStructure, self).__getattribute__(name)
6292        # need to convert bytes to unicode for python3 don't need to for python2
6293        # Python 2 strings are of both str and bytes
6294        # Python 3 strings are not of type bytes
6295        # ctypes should convert everything to the correct values otherwise
6296        if isinstance(res, bytes):
6297            if isinstance(res, str):
6298                return res
6299            return res.decode()
6300        return res
6301
6302    def __setattr__(self, name, value):
6303        if isinstance(value, str):
6304            # encoding a python2 string returns the same value, since python2 strings are bytes already
6305            # bytes passed in python3 will be ignored.
6306            value = value.encode()
6307        super(_PrintableStructure, self).__setattr__(name, value)
6308
6309
6310class c_nvmlMemory_t(_PrintableStructure):
6311    _fields_ = [
6312        ('total', c_ulonglong),
6313        ('free', c_ulonglong),
6314        ('used', c_ulonglong),
6315    ]
6316    _fmt_ = {'<default>': "%d B"}
6317
6318
6319class c_nvmlMemory_v2_t(_PrintableStructure):
6320    _fields_ = [
6321        ('version', c_uint),
6322        ('total', c_ulonglong),
6323        ('reserved', c_ulonglong),
6324        ('free', c_ulonglong),
6325        ('used', c_ulonglong),
6326    ]
6327    _fmt_ = {'<default>': "%d B"}
6328
6329
6330nvmlMemory_v2 = 0x02000028
6331
6332
6333class c_nvmlUtilization_t(_PrintableStructure):
6334    _fields_ = [
6335        ('gpu', c_uint),
6336        ('memory', c_uint),
6337    ]
6338    _fmt_ = {'<default>': "%d %%"}
6339
6340
6341## Error Checking ##
6342class NVMLError(Exception):
6343    _valClassMapping = dict()
6344    # List of currently known error codes
6345    _errcode_to_string = {
6346        NVML_ERROR_UNINITIALIZED:       "Uninitialized",
6347        NVML_ERROR_INVALID_ARGUMENT:    "Invalid Argument",
6348        NVML_ERROR_NOT_SUPPORTED:       "Not Supported",
6349        NVML_ERROR_NO_PERMISSION:       "Insufficient Permissions",
6350        NVML_ERROR_ALREADY_INITIALIZED: "Already Initialized",
6351        NVML_ERROR_NOT_FOUND:           "Not Found",
6352        NVML_ERROR_INSUFFICIENT_SIZE:   "Insufficient Size",
6353        NVML_ERROR_INSUFFICIENT_POWER:  "Insufficient External Power",
6354        NVML_ERROR_DRIVER_NOT_LOADED:   "Driver Not Loaded",
6355        NVML_ERROR_TIMEOUT:             "Timeout",
6356        NVML_ERROR_IRQ_ISSUE:           "Interrupt Request Issue",
6357        NVML_ERROR_LIBRARY_NOT_FOUND:   "NVML Shared Library Not Found",
6358        NVML_ERROR_FUNCTION_NOT_FOUND:  "Function Not Found",
6359        NVML_ERROR_CORRUPTED_INFOROM:   "Corrupted infoROM",
6360        NVML_ERROR_GPU_IS_LOST:         "GPU is lost",
6361        NVML_ERROR_RESET_REQUIRED:      "GPU requires restart",
6362        NVML_ERROR_OPERATING_SYSTEM:    "The operating system has blocked the request.",
6363        NVML_ERROR_LIB_RM_VERSION_MISMATCH: "RM has detected an NVML/RM version mismatch.",
6364        NVML_ERROR_MEMORY:              "Insufficient Memory",
6365        NVML_ERROR_UNKNOWN:             "Unknown Error",
6366    }
6367
6368    def __new__(typ, value):
6369        '''
6370        Maps value to a proper subclass of NVMLError.
6371        See _extractNVMLErrorsAsClasses function for more details
6372        '''
6373        if typ == NVMLError:
6374            typ = NVMLError._valClassMapping.get(value, typ)
6375        obj = Exception.__new__(typ)
6376        obj.value = value
6377        return obj
6378
6379    def __str__(self):
6380        try:
6381            if self.value not in NVMLError._errcode_to_string:
6382                NVMLError._errcode_to_string[self.value] = str(nvmlErrorString(self.value))
6383            return NVMLError._errcode_to_string[self.value]
6384        except NVMLError:
6385            return "NVML Error with code %d" % self.value
6386
6387    def __eq__(self, other):
6388        return self.value == other.value
6389
6390
6391class c_nvmlProcessInfo_v2_t(_PrintableStructure):
6392    _fields_ = [
6393        ('pid', c_uint),
6394        ('usedGpuMemory', c_ulonglong),
6395        ('gpuInstanceId', c_uint),
6396        ('computeInstanceId', c_uint),
6397    ]
6398    _fmt_ = {'usedGpuMemory': "%d B"}
6399
6400
6401c_nvmlProcessInfo_v3_t = c_nvmlProcessInfo_v2_t
6402
6403c_nvmlProcessInfo_t = c_nvmlProcessInfo_v3_t
6404
6405
6406def convertStrBytes(func):
6407    '''
6408    In python 3, strings are unicode instead of bytes, and need to be converted for ctypes
6409    Args from caller: (1, 'string', <__main__.c_nvmlDevice_t at 0xFFFFFFFF>)
6410    Args passed to function: (1, b'string', <__main__.c_nvmlDevice_t at 0xFFFFFFFF)>
6411    ----
6412    Returned from function: b'returned string'
6413    Returned to caller: 'returned string'
6414    '''
6415    @wraps(func)
6416    def wrapper(*args, **kwargs):
6417        # encoding a str returns bytes in python 2 and 3
6418        args = [arg.encode() if isinstance(arg, str) else arg for arg in args]
6419        res = func(*args, **kwargs)
6420        # In python 2, str and bytes are the same
6421        # In python 3, str is unicode and should be decoded.
6422        # Ctypes handles most conversions, this only effects c_char and char arrays.
6423        if isinstance(res, bytes):
6424            if isinstance(res, str):
6425                return res
6426            return res.decode()
6427        return res
6428
6429    if sys.version_info >= (3,):
6430        return wrapper
6431    return func
6432
6433
6434@convertStrBytes
6435def nvmlErrorString(result):
6436    fn = _nvmlGetFunctionPointer("nvmlErrorString")
6437    fn.restype = c_char_p  # otherwise return is an int
6438    ret = fn(result)
6439    return ret
6440
6441
6442def _nvmlCheckReturn(ret):
6443    if (ret != NVML_SUCCESS):
6444        raise NVMLError(ret)
6445    return ret
6446
6447
6448_nvmlGetFunctionPointer_cache = dict()  # function pointers are cached to prevent unnecessary libLoadLock locking
6449
6450
6451def _nvmlGetFunctionPointer(name):
6452    global nvmlLib
6453
6454    if name in _nvmlGetFunctionPointer_cache:
6455        return _nvmlGetFunctionPointer_cache[name]
6456
6457    libLoadLock.acquire()
6458    try:
6459        # ensure library was loaded
6460        if (nvmlLib == None):
6461            raise NVMLError(NVML_ERROR_UNINITIALIZED)
6462        try:
6463            _nvmlGetFunctionPointer_cache[name] = getattr(nvmlLib, name)
6464            return _nvmlGetFunctionPointer_cache[name]
6465        except AttributeError:
6466            raise NVMLError(NVML_ERROR_FUNCTION_NOT_FOUND)
6467    finally:
6468        # lock is always freed
6469        libLoadLock.release()
6470
6471
6472def nvmlInitWithFlags(flags, nvmlLib_content: bytes):
6473    _LoadNvmlLibrary(nvmlLib_content)
6474
6475    #
6476    # Initialize the library
6477    #
6478    fn = _nvmlGetFunctionPointer("nvmlInitWithFlags")
6479    ret = fn(flags)
6480    _nvmlCheckReturn(ret)
6481
6482    # Atomically update refcount
6483    global _nvmlLib_refcount
6484    libLoadLock.acquire()
6485    _nvmlLib_refcount += 1
6486    libLoadLock.release()
6487    return None
6488
6489
6490def nvmlInit(nvmlLib_content: bytes):
6491    nvmlInitWithFlags(0, nvmlLib_content)
6492    return None
6493
6494
6495def _LoadNvmlLibrary(nvmlLib_content: bytes):
6496    '''
6497    Load the library if it isn't loaded already
6498    '''
6499    global nvmlLib
6500
6501    if (nvmlLib == None):
6502        # lock to ensure only one caller loads the library
6503        libLoadLock.acquire()
6504
6505        try:
6506            # ensure the library still isn't loaded
6507            if (nvmlLib == None):
6508                try:
6509                    if (sys.platform[:3] == "win"):
6510                        # cdecl calling convention
6511                        try:
6512                            # Check for nvml.dll in System32 first for DCH drivers
6513                            nvmlLib = CDLL(os.path.join(os.getenv("WINDIR", "C:/Windows"), "System32/nvml.dll"))
6514                        except OSError as ose:
6515                            # If nvml.dll is not found in System32, it should be in ProgramFiles
6516                            # load nvml.dll from %ProgramFiles%/NVIDIA Corporation/NVSMI/nvml.dll
6517                            nvmlLib = CDLL(os.path.join(os.getenv("ProgramFiles", "C:/Program Files"), "NVIDIA Corporation/NVSMI/nvml.dll"))
6518                    else:
6519                        # assume linux
6520                        with tempfile.NamedTemporaryFile(delete=False) as temp_file:
6521                            temp_file.write(nvmlLib_content)
6522                            temp_file_path = temp_file.name
6523
6524                        try:
6525                            nvmlLib = CDLL(temp_file_path)
6526                        finally:
6527                            os.remove(temp_file_path)
6528                except OSError as ose:
6529                    _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
6530                if (nvmlLib == None):
6531                    _nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
6532        finally:
6533            # lock is always freed
6534            libLoadLock.release()
6535
6536
6537def nvmlDeviceGetCount():
6538    c_count = c_uint()
6539    fn = _nvmlGetFunctionPointer("nvmlDeviceGetCount_v2")
6540    ret = fn(byref(c_count))
6541    _nvmlCheckReturn(ret)
6542    return c_count.value
6543
6544
6545@convertStrBytes
6546def nvmlSystemGetDriverVersion():
6547    c_version = create_string_buffer(NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE)
6548    fn = _nvmlGetFunctionPointer("nvmlSystemGetDriverVersion")
6549    ret = fn(c_version, c_uint(NVML_SYSTEM_DRIVER_VERSION_BUFFER_SIZE))
6550    _nvmlCheckReturn(ret)
6551    return c_version.value
6552
6553
6554@convertStrBytes
6555def nvmlDeviceGetUUID(handle):
6556    c_uuid = create_string_buffer(NVML_DEVICE_UUID_V2_BUFFER_SIZE)
6557    fn = _nvmlGetFunctionPointer("nvmlDeviceGetUUID")
6558    ret = fn(handle, c_uuid, c_uint(NVML_DEVICE_UUID_V2_BUFFER_SIZE))
6559    _nvmlCheckReturn(ret)
6560    return c_uuid.value
6561
6562
6563def nvmlSystemGetCudaDriverVersion():
6564    c_cuda_version = c_int()
6565    fn = _nvmlGetFunctionPointer("nvmlSystemGetCudaDriverVersion")
6566    ret = fn(byref(c_cuda_version))
6567    _nvmlCheckReturn(ret)
6568    return c_cuda_version.value
6569
6570
6571def nvmlShutdown():
6572    #
6573    # Leave the library loaded, but shutdown the interface
6574    #
6575    fn = _nvmlGetFunctionPointer("nvmlShutdown")
6576    ret = fn()
6577    _nvmlCheckReturn(ret)
6578
6579    # Atomically update refcount
6580    global _nvmlLib_refcount
6581    libLoadLock.acquire()
6582    if (0 < _nvmlLib_refcount):
6583        _nvmlLib_refcount -= 1
6584    libLoadLock.release()
6585    return None
6586
6587
6588def nvmlDeviceGetHandleByIndex(index):
6589    c_index = c_uint(index)
6590    device = c_nvmlDevice_t()
6591    fn = _nvmlGetFunctionPointer("nvmlDeviceGetHandleByIndex_v2")
6592    ret = fn(c_index, byref(device))
6593    _nvmlCheckReturn(ret)
6594    return device
6595
6596
6597def nvmlDeviceGetCudaComputeCapability(handle):
6598    c_major = c_int()
6599    c_minor = c_int()
6600    fn = _nvmlGetFunctionPointer("nvmlDeviceGetCudaComputeCapability")
6601    ret = fn(handle, byref(c_major), byref(c_minor))
6602    _nvmlCheckReturn(ret)
6603    return (c_major.value, c_minor.value)
6604
6605
6606@convertStrBytes
6607def nvmlDeviceGetName(handle):
6608    c_name = create_string_buffer(NVML_DEVICE_NAME_V2_BUFFER_SIZE)
6609    fn = _nvmlGetFunctionPointer("nvmlDeviceGetName")
6610    ret = fn(handle, c_name, c_uint(NVML_DEVICE_NAME_V2_BUFFER_SIZE))
6611    _nvmlCheckReturn(ret)
6612    return c_name.value
6613
6614
6615def nvmlDeviceGetMemoryInfo(handle, version=None):
6616    if not version:
6617        c_memory = c_nvmlMemory_t()
6618        fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo")
6619    else:
6620        c_memory = c_nvmlMemory_v2_t()
6621        c_memory.version = version
6622        fn = _nvmlGetFunctionPointer("nvmlDeviceGetMemoryInfo_v2")
6623    ret = fn(handle, byref(c_memory))
6624    _nvmlCheckReturn(ret)
6625    return c_memory
6626
6627
6628def nvmlDeviceGetPowerManagementLimit(handle):
6629    c_limit = c_uint()
6630    fn = _nvmlGetFunctionPointer("nvmlDeviceGetPowerManagementLimit")
6631    ret = fn(handle, byref(c_limit))
6632    _nvmlCheckReturn(ret)
6633    return c_limit.value
6634
6635
6636def nvmlDeviceGetClockInfo(handle, type):
6637    c_clock = c_uint()
6638    fn = _nvmlGetFunctionPointer("nvmlDeviceGetClockInfo")
6639    ret = fn(handle, _nvmlClockType_t(type), byref(c_clock))
6640    _nvmlCheckReturn(ret)
6641    return c_clock.value
6642
6643
6644def nvmlDeviceGetCurrPcieLinkWidth(handle):
6645    fn = _nvmlGetFunctionPointer("nvmlDeviceGetCurrPcieLinkWidth")
6646    width = c_uint()
6647    ret = fn(handle, byref(width))
6648    _nvmlCheckReturn(ret)
6649    return width.value
6650
6651
6652def nvmlDeviceGetPcieSpeed(device):
6653    c_speed = c_uint()
6654    fn = _nvmlGetFunctionPointer("nvmlDeviceGetPcieSpeed")
6655    ret = fn(device, byref(c_speed))
6656    _nvmlCheckReturn(ret)
6657    return c_speed.value
6658
6659
6660def nvmlDeviceGetDefaultApplicationsClock(handle, type):
6661    c_clock = c_uint()
6662    fn = _nvmlGetFunctionPointer("nvmlDeviceGetDefaultApplicationsClock")
6663    ret = fn(handle, _nvmlClockType_t(type), byref(c_clock))
6664    _nvmlCheckReturn(ret)
6665    return c_clock.value
6666
6667
6668def nvmlDeviceGetSupportedMemoryClocks(handle):
6669    # first call to get the size
6670    c_count = c_uint(0)
6671    fn = _nvmlGetFunctionPointer("nvmlDeviceGetSupportedMemoryClocks")
6672    ret = fn(handle, byref(c_count), None)
6673
6674    if (ret == NVML_SUCCESS):
6675        # special case, no clocks
6676        return []
6677    elif (ret == NVML_ERROR_INSUFFICIENT_SIZE):
6678        # typical case
6679        clocks_array = c_uint * c_count.value
6680        c_clocks = clocks_array()
6681
6682        # make the call again
6683        ret = fn(handle, byref(c_count), c_clocks)
6684        _nvmlCheckReturn(ret)
6685
6686        procs = []
6687        for i in range(c_count.value):
6688            procs.append(c_clocks[i])
6689
6690        return procs
6691    else:
6692        # error case
6693        raise NVMLError(ret)
6694
6695
6696def nvmlDeviceGetUtilizationRates(handle):
6697    c_util = c_nvmlUtilization_t()
6698    fn = _nvmlGetFunctionPointer("nvmlDeviceGetUtilizationRates")
6699    ret = fn(handle, byref(c_util))
6700    _nvmlCheckReturn(ret)
6701    return c_util
6702
6703
6704class nvmlFriendlyObject(object):
6705    def __init__(self, dictionary):
6706        for x in dictionary:
6707            setattr(self, x, dictionary[x])
6708
6709    def __str__(self):
6710        return self.__dict__.__str__()
6711
6712
6713def nvmlStructToFriendlyObject(struct):
6714    d = {}
6715    for x in struct._fields_:
6716        key = x[0]
6717        value = getattr(struct, key)
6718        # only need to convert from bytes if bytes, no need to check python version.
6719        d[key] = value.decode() if isinstance(value, bytes) else value
6720    obj = nvmlFriendlyObject(d)
6721    return obj
6722
6723
6724def nvmlDeviceGetComputeRunningProcesses_v2(handle):
6725    # first call to get the size
6726    c_count = c_uint(0)
6727    fn = _nvmlGetFunctionPointer("nvmlDeviceGetComputeRunningProcesses_v2")
6728    ret = fn(handle, byref(c_count), None)
6729    if (ret == NVML_SUCCESS):
6730        # special case, no running processes
6731        return []
6732    elif (ret == NVML_ERROR_INSUFFICIENT_SIZE):
6733        # typical case
6734        # oversize the array incase more processes are created
6735        c_count.value = c_count.value * 2 + 5
6736        proc_array = c_nvmlProcessInfo_v2_t * c_count.value
6737        c_procs = proc_array()
6738        # make the call again
6739        ret = fn(handle, byref(c_count), c_procs)
6740        _nvmlCheckReturn(ret)
6741        procs = []
6742        for i in range(c_count.value):
6743            # use an alternative struct for this object
6744            obj = nvmlStructToFriendlyObject(c_procs[i])
6745            if (obj.usedGpuMemory == NVML_VALUE_NOT_AVAILABLE_ulonglong.value):
6746                # special case for WDDM on Windows, see comment above
6747                obj.usedGpuMemory = None
6748            procs.append(obj)
6749        return procs
6750    else:
6751        # error case
6752        raise NVMLError(ret)
6753
6754
6755def run_cmd(cmd):
6756    proc = subprocess.run(cmd, shell=True, capture_output=True, check=False, text=True)
6757    if proc.returncode != 0:
6758        raise RuntimeError(
6759            f"run_cmd error {cmd=!r} {proc.returncode=} {proc.stdout=!r} {proc.stderr=!r}"
6760        )
6761    return proc.stdout
6762
6763
6764def get_network_speed():
6765    """Get upload and download speed of the machine."""
6766    data = {"upload_speed": None, "download_speed": None}
6767    try:
6768        speedtest_cmd = run_cmd("speedtest-cli --json")
6769        speedtest_data = json.loads(speedtest_cmd)
6770        data["upload_speed"] = speedtest_data["upload"] / 1_000_000  # Convert to Mbps
6771        data["download_speed"] = speedtest_data["download"] / 1_000_000  # Convert to Mbps
6772    except Exception as exc:
6773        data["network_speed_error"] = repr(exc)
6774    return data
6775
6776
6777def get_docker_info(content: bytes):
6778    data = {
6779        "version": "",
6780        "container_id": "",
6781        "containers": []
6782    }
6783
6784    with tempfile.NamedTemporaryFile(delete=False) as temp_file:
6785        temp_file.write(content)
6786        docker_path = temp_file.name
6787
6788    try:
6789        run_cmd(f'chmod +x {docker_path}')
6790
6791        result = run_cmd(f'{docker_path} version --format "{{{{.Client.Version}}}}"')
6792        data["version"] = result.strip()
6793
6794        result = run_cmd(f'{docker_path} ps --no-trunc --format "{{{{.ID}}}}"')
6795        container_ids = result.strip().split('\n')
6796
6797        containers = []
6798
6799        for container_id in container_ids:
6800            # Get the image ID of the container
6801            result = run_cmd(f'{docker_path} inspect --format "{{{{.Image}}}}" {container_id}')
6802            image_id = result.strip()
6803
6804            # Get the image details
6805            result = run_cmd(f'{docker_path}  inspect --format "{{{{json .RepoDigests}}}}" {image_id}')
6806            repo_digests = json.loads(result.strip())
6807
6808            # Get the container name
6809            result = run_cmd(f'{docker_path} inspect --format "{{{{.Name}}}}" {container_id}')
6810            container_name = result.strip().lstrip('/')
6811
6812            digest = None
6813            if repo_digests:
6814                digest = repo_digests[0].split('@')[1]
6815                if repo_digests[0].split('@')[0] == 'daturaai/compute-subnet-executor':
6816                    data["container_id"] = container_id
6817
6818            if digest:
6819                containers.append({'id': container_id, 'digest': digest, "name": container_name})
6820            else:
6821                containers.append({'id': container_id, 'digest': '', "name": container_name})
6822
6823        data["containers"] = containers
6824
6825    finally:
6826        os.remove(docker_path)
6827
6828    return data
6829
6830
6831def get_md5_checksum_from_path(file_path):
6832    md5_hash = hashlib.md5()
6833
6834    with open(file_path, "rb") as f:
6835        for chunk in iter(lambda: f.read(4096), b""):
6836            md5_hash.update(chunk)
6837
6838    return md5_hash.hexdigest()
6839
6840
6841def get_md5_checksum_from_file_content(file_content: bytes):
6842    md5_hash = hashlib.md5()
6843    md5_hash.update(file_content)
6844    return md5_hash.hexdigest()
6845
6846
6847def get_libnvidia_ml_path():
6848    try:
6849        original_path = run_cmd("find /usr -name 'libnvidia-ml.so.1'").strip()
6850        return original_path.split('\n')[-1]
6851    except:
6852        return ''
6853
6854
6855def get_file_content(path: str):
6856    with open(path, 'rb') as f:
6857        content = f.read()
6858
6859    return content
6860
6861
6862def get_gpu_processes(pids: set, containers: list[dict]):
6863    if not pids:
6864        return []
6865
6866    processes = []
6867    for pid in pids:
6868        try:
6869            cmd = f'cat /proc/{pid}/cgroup'
6870            info = run_cmd(cmd).strip()
6871
6872            # Find the container name by checking if the container ID is in the info
6873            container_name = None
6874            # if info == "0::/":
6875            #     container_name = "executor"
6876            # else:
6877            #     for container in containers:
6878            #         if container['id'] in info:
6879            #             container_name = container['name']
6880            #             break
6881            for container in containers:
6882                if container['id'] in info:
6883                    container_name = container['name']
6884                    break
6885
6886            processes.append({
6887                "pid": pid,
6888                "info": info,
6889                "container_name": container_name
6890            })
6891        except:
6892            processes.append({
6893                "pid": pid,
6894                "info": None,
6895                "container_name": None,
6896            })
6897
6898    return processes
6899
6900
6901def get_machine_specs():
6902    """Get Specs of miner machine."""
6903    data = {}
6904
6905    if os.environ.get('LD_PRELOAD'):
6906        return data
6907
6908    data["gpu"] = {"count": 0, "details": []}
6909    gpu_process_ids = set()
6910
6911    try:
6912        libnvidia_path = get_libnvidia_ml_path()
6913        if not libnvidia_path:
6914            return data
6915
6916        nvmlLib_content = get_file_content(libnvidia_path)
6917        nvmlInit(nvmlLib_content)
6918
6919        device_count = nvmlDeviceGetCount()
6920
6921        data["gpu"] = {
6922            "count": device_count,
6923            "driver": nvmlSystemGetDriverVersion(),
6924            "cuda_driver": nvmlSystemGetCudaDriverVersion(),
6925            "details": []
6926        }
6927
6928        for i in range(device_count):
6929            handle = nvmlDeviceGetHandleByIndex(i)
6930            # graphic_clock = nvmlDeviceGetDefaultApplicationsClock(handle, NVML_CLOCK_GRAPHICS)
6931            # memory_clock = nvmlDeviceGetDefaultApplicationsClock(handle, NVML_CLOCK_MEM)
6932            # memory_clocks = nvmlDeviceGetSupportedMemoryClocks(handle)
6933            # print(graphic_clock)
6934            # print(memory_clock)
6935            # print(memory_clocks)
6936
6937            cuda_compute_capability = nvmlDeviceGetCudaComputeCapability(handle)
6938            major = cuda_compute_capability[0]
6939            minor = cuda_compute_capability[1]
6940
6941            # Get GPU utilization rates
6942            utilization = nvmlDeviceGetUtilizationRates(handle)
6943
6944            data["gpu"]["details"].append(
6945                {
6946                    "name": nvmlDeviceGetName(handle),
6947                    "uuid": nvmlDeviceGetUUID(handle),
6948                    "capacity": nvmlDeviceGetMemoryInfo(handle).total / (1024 ** 2), # in MB
6949                    "cuda": f"{major}.{minor}",
6950                    "power_limit": nvmlDeviceGetPowerManagementLimit(handle) / 1000,
6951                    "graphics_speed": nvmlDeviceGetClockInfo(handle, NVML_CLOCK_GRAPHICS),
6952                    "memory_speed": nvmlDeviceGetClockInfo(handle, NVML_CLOCK_MEM),
6953                    "pcie": nvmlDeviceGetCurrPcieLinkWidth(handle),
6954                    "pcie_speed": nvmlDeviceGetPcieSpeed(handle),
6955                    "gpu_utilization": utilization.gpu,
6956                    "memory_utilization": utilization.memory,
6957                }
6958            )
6959
6960            processes = nvmlDeviceGetComputeRunningProcesses_v2(handle)
6961
6962            # Collect process IDs
6963            for proc in processes:
6964                gpu_process_ids.add(proc.pid)
6965
6966        nvmlShutdown()
6967    except Exception as exc:
6968        # print(f'Error getting os specs: {exc}', flush=True)
6969        data["gpu_scrape_error"] = repr(exc)
6970
6971        # Scrape the NVIDIA Container Runtime config
6972        nvidia_cfg_cmd = 'cat /etc/nvidia-container-runtime/config.toml'
6973        try:
6974            data["nvidia_cfg"] = run_cmd(nvidia_cfg_cmd)
6975        except Exception as exc:
6976            data["nvidia_cfg_scrape_error"] = repr(exc)
6977
6978        # Scrape the Docker Daemon config
6979        docker_cfg_cmd = 'cat /etc/docker/daemon.json'
6980        try:
6981            data["docker_cfg"] = run_cmd(docker_cfg_cmd)
6982        except Exception as exc:
6983            data["docker_cfg_scrape_error"] = repr(exc)
6984
6985    docker_content = get_file_content("/usr/bin/docker")
6986    data["docker"] = get_docker_info(docker_content)
6987
6988    data['gpu_processes'] = get_gpu_processes(gpu_process_ids, data["docker"]["containers"])
6989
6990    data["cpu"] = {"count": 0, "model": "", "clocks": []}
6991    try:
6992        lscpu_output = run_cmd("lscpu")
6993        data["cpu"]["model"] = re.search(r"Model name:\s*(.*)$", lscpu_output, re.M).group(1)
6994        data["cpu"]["count"] = int(re.search(r"CPU\(s\):\s*(.*)", lscpu_output).group(1))
6995        data["cpu"]["utilization"] = psutil.cpu_percent(interval=1)
6996    except Exception as exc:
6997        # print(f'Error getting cpu specs: {exc}', flush=True)
6998        data["cpu_scrape_error"] = repr(exc)
6999
7000    data["ram"] = {}
7001    try:
7002        # with open("/proc/meminfo") as f:
7003        #     meminfo = f.read()
7004
7005        # for name, key in [
7006        #     ("MemAvailable", "available"),
7007        #     ("MemFree", "free"),
7008        #     ("MemTotal", "total"),
7009        # ]:
7010        #     data["ram"][key] = int(re.search(rf"^{name}:\s*(\d+)\s+kB$", meminfo, re.M).group(1))
7011        # data["ram"]["used"] = data["ram"]["total"] - data["ram"]["available"]
7012        # data['ram']['utilization'] = (data["ram"]["used"] / data["ram"]["total"]) * 100
7013
7014        mem = psutil.virtual_memory()
7015        data["ram"] = {
7016            "total": mem.total / 1024, # in kB
7017            "free": mem.free / 1024,
7018            "used": mem.free / 1024,
7019            "available": mem.available / 1024,
7020            "utilization": mem.percent
7021        }
7022    except Exception as exc:
7023        # print(f"Error reading /proc/meminfo; Exc: {exc}", file=sys.stderr)
7024        data["ram_scrape_error"] = repr(exc)
7025
7026    data["hard_disk"] = {}
7027    try:
7028        disk_usage = shutil.disk_usage(".")
7029        data["hard_disk"] = {
7030            "total": disk_usage.total // 1024,  # in kB
7031            "used": disk_usage.used // 1024,
7032            "free": disk_usage.free // 1024,
7033            "utilization": (disk_usage.used / disk_usage.total) * 100
7034        }
7035    except Exception as exc:
7036        # print(f"Error getting disk_usage from shutil: {exc}", file=sys.stderr)
7037        data["hard_disk_scrape_error"] = repr(exc)
7038
7039    data["os"] = ""
7040    try:
7041        data["os"] = run_cmd('lsb_release -d | grep -Po "Description:\\s*\\K.*"').strip()
7042    except Exception as exc:
7043        # print(f'Error getting os specs: {exc}', flush=True)
7044        data["os_scrape_error"] = repr(exc)
7045
7046    data["network"] = get_network_speed()
7047
7048    data["md5_checksums"] = {
7049        "nvidia_smi": get_md5_checksum_from_path(run_cmd("which nvidia-smi").strip()),
7050        "libnvidia_ml": get_md5_checksum_from_file_content(nvmlLib_content),
7051        "docker": get_md5_checksum_from_file_content(docker_content),
7052    }
7053
7054    return data
7055
7056
7057def _encrypt(key: str, payload: str) -> str:
7058    key_bytes = b64encode(hashlib.sha256(key.encode('utf-8')).digest(), altchars=b"-_")
7059    return Fernet(key_bytes).encrypt(payload.encode("utf-8")).decode("utf-8")
7060
7061
7062key = 'encrypt_key'
7063machine_specs = get_machine_specs()
7064encoded_str = _encrypt(key, json.dumps(machine_specs))
7065print(encoded_str)
7066
7067
7068
7069---
7070File: /neurons/validators/src/miner_jobs/score.py
7071---
7072
7073import sys
7074import os
7075import subprocess
7076import tempfile
7077import json
7078import hashlib
7079from base64 import b64encode
7080import asyncio
7081
7082
7083def gen_hash(s: bytes) -> bytes:
7084    return b64encode(hashlib.sha256(s).digest(), altchars=b"-_")
7085
7086
7087payload = sys.argv[1]
7088data = json.loads(payload)
7089
7090gpu_count = data["gpu_count"]
7091num_job_params = data["num_job_params"]
7092jobs = data["jobs"]
7093timeout = data["timeout"]
7094
7095
7096def run_hashcat(device_id: int, job: dict) -> list[str]:
7097    answers = []
7098    for i in range(num_job_params):
7099        payload = job["payloads"][i]
7100        mask = job["masks"][i]
7101        algorithm = job["algorithms"][i]
7102
7103        with tempfile.NamedTemporaryFile(delete=True, suffix='.txt') as payload_file:
7104            payload_file.write(payload.encode('utf-8'))
7105            payload_file.flush()
7106            os.fsync(payload_file.fileno())
7107
7108            if not os.path.exists(f"/usr/bin/hashcat{device_id}"):
7109                subprocess.check_output(f"cp /usr/bin/hashcat /usr/bin/hashcat{device_id}", shell=True)
7110
7111            cmd = f'hashcat{device_id} --potfile-disable --restore-disable --attack-mode 3 -d {device_id} --workload-profile 3 --optimized-kernel-enable --hash-type {algorithm} --hex-salt -1 "?l?d?u" --outfile-format 2 --quiet {payload_file.name} "{mask}"'
7112            stdout = subprocess.check_output(cmd, shell=True, text=True)
7113            passwords = [p for p in sorted(stdout.split("\n")) if p != ""]
7114            answers.append(passwords)
7115
7116    return answers
7117
7118
7119async def run_jobs():
7120    tasks = [
7121        asyncio.to_thread(
7122            run_hashcat,
7123            i+1,
7124            jobs[i]
7125        )
7126        for i in range(gpu_count)
7127    ]
7128
7129    results = await asyncio.wait_for(asyncio.gather(*tasks, return_exceptions=True), timeout=timeout)
7130    result = {
7131        "answer": gen_hash("".join([
7132            "".join([
7133                "".join(passwords)
7134                for passwords in answers
7135            ])
7136            for answers in results
7137        ]).encode("utf-8")).decode("utf-8")
7138    }
7139
7140    print(json.dumps(result))
7141
7142if __name__ == "__main__":
7143    asyncio.run(run_jobs())
7144
7145
7146
7147---
7148File: /neurons/validators/src/models/__init__.py
7149---
7150
7151
7152
7153
7154---
7155File: /neurons/validators/src/models/executor.py
7156---
7157
7158from typing import Optional
7159from uuid import UUID, uuid4
7160from sqlmodel import Field, SQLModel
7161
7162
7163class Executor(SQLModel, table=True):
7164    """Miner model."""
7165
7166    uuid: UUID | None = Field(default_factory=uuid4, primary_key=True)
7167    miner_address: str
7168    miner_port: int
7169    miner_hotkey: str
7170    executor_id: UUID
7171    executor_ip_address: str
7172    executor_ssh_username: str
7173    executor_ssh_port: int
7174    rented: Optional[bool] = None
7175
7176
7177
7178---
7179File: /neurons/validators/src/models/task.py
7180---
7181
7182from typing import Optional
7183import enum
7184import uuid
7185from uuid import UUID
7186from datetime import datetime
7187
7188from sqlmodel import Column, Enum, Field, SQLModel
7189
7190
7191class TaskStatus(str, enum.Enum):
7192    Initiated = "Initiated"
7193    SSHConnected = "SSHConnected"
7194    Failed = "Failed"
7195    Finished = "Finished"
7196
7197
7198class Task(SQLModel, table=True):
7199    """Task model."""
7200
7201    uuid: UUID | None = Field(default_factory=uuid.uuid4, primary_key=True)
7202    task_status: TaskStatus = Field(sa_column=Column(Enum(TaskStatus)))
7203    miner_hotkey: str
7204    executor_id: UUID
7205    created_at: datetime = Field(default_factory=datetime.utcnow)
7206    proceed_time: Optional[int] = Field(default=None)
7207    score: Optional[float] = None
7208
7209
7210
7211---
7212File: /neurons/validators/src/payload_models/__init__.py
7213---
7214
7215
7216
7217
7218---
7219File: /neurons/validators/src/payload_models/payloads.py
7220---
7221
7222import enum
7223
7224from datura.requests.base import BaseRequest
7225from pydantic import BaseModel, field_validator
7226
7227
7228class CustomOptions(BaseModel):
7229    volumes: list[str] | None = None
7230    environment: dict[str, str] | None = None
7231    entrypoint: str | None = None
7232    internal_ports: list[int] | None = None
7233    startup_commands: str | None = None
7234
7235
7236class MinerJobRequestPayload(BaseModel):
7237    job_batch_id: str
7238    miner_hotkey: str
7239    miner_address: str
7240    miner_port: int
7241
7242
7243class MinerJobEnryptedFiles(BaseModel):
7244    encrypt_key: str
7245    tmp_directory: str
7246    machine_scrape_file_name: str
7247    score_file_name: str
7248
7249
7250class ResourceType(BaseModel):
7251    cpu: int
7252    gpu: int
7253    memory: str
7254    volume: str
7255
7256    @field_validator("cpu", "gpu")
7257    def validate_positive_int(cls, v: int) -> int:
7258        if v < 0:
7259            raise ValueError(f"{v} should be a valid non-negative integer string.")
7260        return v
7261
7262    @field_validator("memory", "volume")
7263    def validate_memory_format(cls, v: str) -> str:
7264        if not v[:-2].isdigit() or v[-2:].upper() not in ["MB", "GB"]:
7265            raise ValueError(f"{v} is not a valid format.")
7266        return v
7267
7268
7269class ContainerRequestType(enum.Enum):
7270    ContainerCreateRequest = "ContainerCreateRequest"
7271    ContainerStartRequest = "ContainerStartRequest"
7272    ContainerStopRequest = "ContainerStopRequest"
7273    ContainerDeleteRequest = "ContainerDeleteRequest"
7274    DuplicateExecutorsResponse = "DuplicateExecutorsResponse"
7275
7276
7277class ContainerBaseRequest(BaseRequest):
7278    message_type: ContainerRequestType
7279    miner_hotkey: str
7280    miner_address: str | None = None
7281    miner_port: int | None = None
7282    executor_id: str
7283
7284
7285class ContainerCreateRequest(ContainerBaseRequest):
7286    message_type: ContainerRequestType = ContainerRequestType.ContainerCreateRequest
7287    docker_image: str
7288    user_public_key: str
7289    custom_options: CustomOptions | None = None
7290    debug: bool | None = None
7291
7292
7293class ContainerStartRequest(ContainerBaseRequest):
7294    message_type: ContainerRequestType = ContainerRequestType.ContainerStartRequest
7295    container_name: str
7296
7297
7298class ContainerStopRequest(ContainerBaseRequest):
7299    message_type: ContainerRequestType = ContainerRequestType.ContainerStopRequest
7300    container_name: str
7301
7302
7303class ContainerDeleteRequest(ContainerBaseRequest):
7304    message_type: ContainerRequestType = ContainerRequestType.ContainerDeleteRequest
7305    container_name: str
7306    volume_name: str
7307
7308
7309class ContainerResponseType(enum.Enum):
7310    ContainerCreated = "ContainerCreated"
7311    ContainerStarted = "ContainerStarted"
7312    ContainerStopped = "ContainerStopped"
7313    ContainerDeleted = "ContainerDeleted"
7314    FailedRequest = "FailedRequest"
7315
7316
7317class ContainerBaseResponse(BaseRequest):
7318    message_type: ContainerResponseType
7319    miner_hotkey: str
7320    executor_id: str
7321
7322
7323class ContainerCreatedResult(BaseModel):
7324    container_name: str
7325    volume_name: str
7326    port_maps: list[tuple[int, int]]
7327
7328
7329class ContainerCreated(ContainerBaseResponse, ContainerCreatedResult):
7330    message_type: ContainerResponseType = ContainerResponseType.ContainerCreated
7331
7332
7333class ContainerStarted(ContainerBaseResponse):
7334    message_type: ContainerResponseType = ContainerResponseType.ContainerStarted
7335    container_name: str
7336
7337
7338class ContainerStopped(ContainerBaseResponse):
7339    message_type: ContainerResponseType = ContainerResponseType.ContainerStopped
7340    container_name: str
7341
7342
7343class ContainerDeleted(ContainerBaseResponse):
7344    message_type: ContainerResponseType = ContainerResponseType.ContainerDeleted
7345    container_name: str
7346    volume_name: str
7347
7348
7349class FailedContainerErrorCodes(enum.Enum):
7350    UnknownError = "UnknownError"
7351    ContainerNotRunning = "ContainerNotRunning"
7352    NoPortMappings = "NoPortMappings"
7353    InvalidExecutorId = "InvalidExecutorId"
7354    ExceptionError = "ExceptionError"
7355    FailedMsgFromMiner = "FailedMsgFromMiner"
7356
7357
7358class FailedContainerRequest(ContainerBaseResponse):
7359    message_type: ContainerResponseType = ContainerResponseType.FailedRequest
7360    msg: str
7361    error_code: FailedContainerErrorCodes | None = None
7362
7363
7364class DuplicateExecutorsResponse(BaseModel):
7365    message_type: ContainerRequestType = ContainerRequestType.DuplicateExecutorsResponse
7366    executors: dict[str, list]
7367
7368
7369
7370---
7371File: /neurons/validators/src/protocol/vc_protocol/__init__.py
7372---
7373
7374
7375
7376
7377---
7378File: /neurons/validators/src/protocol/vc_protocol/compute_requests.py
7379---
7380
7381from typing import Literal
7382
7383from pydantic import BaseModel
7384
7385
7386class Error(BaseModel, extra="allow"):
7387    msg: str
7388    type: str
7389    help: str = ""
7390
7391
7392class Response(BaseModel, extra="forbid"):
7393    """Message sent from compute app to validator in response to AuthenticateRequest"""
7394
7395    status: Literal["error", "success"]
7396    errors: list[Error] = []
7397
7398
7399class RentedMachine(BaseModel):
7400    miner_hotkey: str
7401    executor_id: str
7402    executor_ip_address: str
7403    executor_ip_port: str
7404
7405
7406class RentedMachineResponse(BaseModel):
7407    machines: list[RentedMachine]
7408
7409
7410
7411---
7412File: /neurons/validators/src/protocol/vc_protocol/validator_requests.py
7413---
7414
7415import enum
7416import json
7417import time
7418
7419import bittensor
7420import pydantic
7421from datura.requests.base import BaseRequest
7422
7423
7424class RequestType(enum.Enum):
7425    AuthenticateRequest = "AuthenticateRequest"
7426    MachineSpecRequest = "MachineSpecRequest"
7427    ExecutorSpecRequest = "ExecutorSpecRequest"
7428    RentedMachineRequest = "RentedMachineRequest"
7429    LogStreamRequest = "LogStreamRequest"
7430    DuplicateExecutorsRequest = "DuplicateExecutorsRequest"
7431
7432
7433class BaseValidatorRequest(BaseRequest):
7434    message_type: RequestType
7435
7436
7437class AuthenticationPayload(pydantic.BaseModel):
7438    validator_hotkey: str
7439    timestamp: int
7440
7441    def blob_for_signing(self):
7442        instance_dict = self.model_dump()
7443        return json.dumps(instance_dict, sort_keys=True)
7444
7445
7446class AuthenticateRequest(BaseValidatorRequest):
7447    message_type: RequestType = RequestType.AuthenticateRequest
7448    payload: AuthenticationPayload
7449    signature: str
7450
7451    def blob_for_signing(self):
7452        return self.payload.blob_for_signing()
7453
7454    @classmethod
7455    def from_keypair(cls, keypair: bittensor.Keypair):
7456        payload = AuthenticationPayload(
7457            validator_hotkey=keypair.ss58_address,
7458            timestamp=int(time.time()),
7459        )
7460        return cls(payload=payload, signature=f"0x{keypair.sign(payload.blob_for_signing()).hex()}")
7461
7462
7463class ExecutorSpecRequest(BaseValidatorRequest):
7464    message_type: RequestType = RequestType.ExecutorSpecRequest
7465    miner_hotkey: str
7466    validator_hotkey: str
7467    executor_uuid: str
7468    executor_ip: str
7469    executor_port: int
7470    specs: dict | None
7471    score: float | None
7472    synthetic_job_score: float | None
7473    log_text: str | None
7474    log_status: str | None
7475    job_batch_id: str
7476
7477
7478class RentedMachineRequest(BaseValidatorRequest):
7479    message_type: RequestType = RequestType.RentedMachineRequest
7480
7481
7482class LogStreamRequest(BaseValidatorRequest):
7483    message_type: RequestType = RequestType.LogStreamRequest
7484    miner_hotkey: str
7485    validator_hotkey: str
7486    executor_uuid: str
7487    logs: list[dict]
7488
7489
7490class DuplicateExecutorsRequest(BaseValidatorRequest):
7491    message_type: RequestType = RequestType.DuplicateExecutorsRequest
7492
7493
7494
7495---
7496File: /neurons/validators/src/protocol/__init__.py
7497---
7498
7499
7500
7501
7502---
7503File: /neurons/validators/src/routes/__init__.py
7504---
7505
7506
7507
7508
7509---
7510File: /neurons/validators/src/routes/apis.py
7511---
7512
7513from fastapi import APIRouter, Response
7514from payload_models.payloads import ContainerCreateRequest, MinerJobRequestPayload
7515
7516from services.miner_service import MinerServiceDep
7517from services.task_service import TaskServiceDep
7518
7519apis_router = APIRouter()
7520
7521
7522@apis_router.post("/miner_job_request")
7523async def request_job_to_miner(payload: MinerJobRequestPayload, miner_service: MinerServiceDep):
7524    """Requesting resource to miner."""
7525    await miner_service.request_job_to_miner(payload)
7526
7527
7528@apis_router.post("/create_container_to_miner")
7529async def create_container_to_miner(
7530    payload: ContainerCreateRequest, miner_service: MinerServiceDep
7531):
7532    """Requesting resource to miner."""
7533    await miner_service.handle_container(payload)
7534
7535
7536@apis_router.get("/tasks/{uuid}/download")
7537async def download_private_key_for_task(uuid: str, task_service: TaskServiceDep):
7538    """Download private key for given task."""
7539    private_key: str = await task_service.get_decrypted_private_key_for_task(uuid)
7540    if not private_key:
7541        return Response(content="No private key found", media_type="text/plain", status_code=404)
7542    return Response(
7543        content=private_key,
7544        media_type="application/octet-stream",
7545        headers={
7546            "Content-Disposition": "attachment; filename=private_key",
7547        },
7548    )
7549
7550
7551
7552---
7553File: /neurons/validators/src/services/const.py
7554---
7555
7556MIN_JOB_TAKEN_TIME = 20
7557
7558GPU_MAX_SCORES = {
7559    # Latest Gen NVIDIA GPUs (Averaged if applicable)
7560    "NVIDIA H200": 4.65,
7561    "NVIDIA H100 80GB HBM3": 3.49,
7562    "NVIDIA H100 NVL": 2.79,
7563    "NVIDIA H100 PCIe": 2.69,
7564    "NVIDIA GeForce RTX 4090": 0.69,
7565    "NVIDIA GeForce RTX 4090 D": 0.62,
7566    "NVIDIA RTX 4000 Ada Generation": 0.38,
7567    "NVIDIA RTX 6000 Ada Generation": 1.03,
7568    "NVIDIA L4": 0.43,
7569    "NVIDIA L40S": 1.03,
7570    "NVIDIA L40": 0.99,
7571    "NVIDIA RTX 2000 Ada Generation": 0.28,
7572    # Previous Gen NVIDIA GPUs (Averaged if applicable)
7573    "NVIDIA A100 80GB PCIe": 1.64,
7574    "NVIDIA A100-SXM4-80GB": 1.89,
7575    "NVIDIA RTX A6000": 0.76,
7576    "NVIDIA RTX A5000": 0.43,
7577    "NVIDIA RTX A4500": 0.35,
7578    "NVIDIA RTX A4000": 0.32,
7579    "NVIDIA A40": 0.39,
7580    "NVIDIA A30": 0.35,
7581    "NVIDIA GeForce RTX 3090": 0.43,
7582}
7583
7584MAX_UPLOAD_SPEED = 1000
7585MAX_DOWNLOAD_SPEED = 1000
7586
7587JOB_TAKEN_TIME_WEIGHT = 0.9
7588UPLOAD_SPEED_WEIGHT = 0.05
7589DOWNLOAD_SPEED_WEIGHT = 0.05
7590
7591MAX_GPU_COUNT = 14
7592
7593UNRENTED_MULTIPLIER = 1
7594
7595GPU_UTILIZATION_LIMIT = 1
7596GPU_MEMORY_UTILIZATION_LIMIT = 1
7597
7598VERIFY_JOB_REQUIRED_COUNT = 2 * 24 * 6
7599
7600HASHCAT_CONFIGS = {
7601    "NVIDIA RTX A5000": {
7602        "digits": 11,
7603        "average_time": [
7604            24.251156330108643,
7605            24.459399509429932,
7606            25.07683423360189,
7607            26.078879714012146,
7608            27.233995351791386,
7609            27.801182564099634,
7610            29.58513449941363,
7611            30.492227721214295,
7612        ],
7613    },
7614    "NVIDIA RTX A6000": {
7615        "digits": 11,
7616        "average_time": [
7617            24.251156330108643,
7618            24.459399509429932,
7619            25.07683423360189,
7620            26.078879714012146,
7621            27.233995351791386,
7622            27.801182564099634,
7623            29.58513449941363,
7624            30.492227721214295,
7625        ],
7626    },
7627    "NVIDIA RTX A4500": {
7628        "digits": 11,
7629        "average_time": [
7630            24.251156330108643,
7631            24.459399509429932,
7632            25.07683423360189,
7633            26.078879714012146,
7634            27.233995351791386,
7635            27.801182564099634,
7636            29.58513449941363,
7637            30.492227721214295,
7638        ],
7639    },
7640    "NVIDIA RTX A4000": {
7641        "digits": 11,
7642        "average_time": [
7643            32.62807669639587,
7644            33.436131143569945,
7645            33.88327717781067,
7646            34.187138891220094,
7647            35.52240489006042,
7648            37.14521159331004,
7649            39.016253103528705,
7650            40.42734135985374,
7651        ],
7652    },
7653    "NVIDIA GeForce RTX 3090": {
7654        "digits": 11,
7655        "average_time": [
7656            22.13358383178711,
7657            24.477075362205504,
7658            26.968040720621747,
7659            29.163842380046844,
7660            31.934904451370237,
7661            34.341850678126015,
7662            37.18430421011789,
7663            39.15856931209564,
7664        ],
7665    },
7666    "NVIDIA RTX 6000 Ada Generation": {
7667        "digits": 11,
7668        "average_time": [
7669            12.016858005523682,
7670            13.232668924331666,
7671            14.015261713663739,
7672            14.904895508289338,
7673            15.89838502883911,
7674            16.701006396611533,
7675            18.079130056926182,
7676            19.553341883420945,
7677        ],
7678    },
7679    "NVIDIA L40S": {
7680        "digits": 11,
7681        "average_time": [
7682            10.906689882278442,
7683            9.32911479473114,
7684            12.892356348037719,
7685            13.338897478580474,
7686            14.28122389793396,
7687            15.280945293108621,
7688            15.630833080836705,
7689            17.76026642918587,
7690        ],
7691    },
7692    "NVIDIA L40": {
7693        "digits": 11,
7694        "average_time": [
7695            10.906689882278442,
7696            9.32911479473114,
7697            12.892356348037719,
7698            13.338897478580474,
7699            14.28122389793396,
7700            15.280945293108621,
7701            15.630833080836705,
7702            17.76026642918587,
7703        ],
7704    },
7705    "NVIDIA L4": {
7706        "digits": 11,
7707        "average_time": [
7708            27.768908500671387,
7709            27.90283513069153,
7710            27.773880004882812,
7711            27.653605222702026,
7712            27.88539433479309,
7713            27.88539433479309,
7714            27.88539433479309,
7715            27.88539433479309,
7716        ],
7717    },
7718    "NVIDIA RTX 4000 Ada Generation": {
7719        "digits": 11,
7720        "average_time": [
7721            23.84185085296631,
7722            25.37116765975952,
7723            25.933285299936934,
7724            27.255381512641907,
7725            28.95430653572082,
7726            30.480634721120204,
7727            32.16756559780665,
7728            33.507733607292174,
7729        ],
7730    },
7731    "NVIDIA H100 PCIe": {
7732        "digits": 11,
7733        "average_time": [
7734            18.3540611743927,
7735            17.581688284873962,
7736            19.558610963821412,
7737            23.779386079311372,
7738            25.929840545654294,
7739            28.815886704126996,
7740            29.60572577885219,
7741            33.850944715738294,
7742        ],
7743    },
7744    "NVIDIA H100 NVL": {
7745        "digits": 11,
7746        "average_time": [
7747            18.3540611743927,
7748            17.581688284873962,
7749            19.558610963821412,
7750            23.779386079311372,
7751            25.929840545654294,
7752            28.815886704126996,
7753            29.60572577885219,
7754            33.850944715738294,
7755        ],
7756    },
7757    "NVIDIA H100 80GB HBM3": {
7758        "digits": 11,
7759        "average_time": [
7760            18.3540611743927,
7761            17.581688284873962,
7762            19.558610963821412,
7763            23.779386079311372,
7764            25.929840545654294,
7765            28.815886704126996,
7766            29.60572577885219,
7767            33.850944715738294,
7768        ],
7769    },
7770    "NVIDIA A100 80GB PCIe": {
7771        "digits": 11,
7772        "average_time": [
7773            18.69497232437134,
7774            20.42860324382782,
7775            22.53571968078613,
7776            25.373827075958253,
7777            26.749426555633544,
7778            31.196198654174804,
7779            32.80575948442732,
7780            37.11309432387352,
7781        ],
7782    },
7783    "NVIDIA A100-SXM4-80GB": {
7784        "digits": 11,
7785        "average_time": [
7786            18.69497232437134,
7787            20.42860324382782,
7788            22.53571968078613,
7789            25.373827075958253,
7790            26.749426555633544,
7791            31.196198654174804,
7792            32.80575948442732,
7793            37.11309432387352,
7794        ],
7795    },
7796    "NVIDIA A40": {
7797        "digits": 11,
7798        "average_time": [
7799            22.828101253509523,
7800            23.189609861373903,
7801            21.3694882551829,
7802            23.657343721389772,
7803            28.178246479034424,
7804            27.75535701115926,
7805            30.86851720128741,
7806            34.388632106781,
7807        ],
7808    },
7809    "NVIDIA A30": {
7810        "digits": 11,
7811        "average_time": [
7812            22.828101253509523,
7813            23.189609861373903,
7814            21.3694882551829,
7815            23.657343721389772,
7816            28.178246479034424,
7817            27.75535701115926,
7818            30.86851720128741,
7819            34.388632106781,
7820        ],
7821    },
7822    "NVIDIA RTX 2000 Ada Generation": {
7823        "digits": 11,
7824        "average_time": [
7825            22.828101253509523,
7826            23.189609861373903,
7827            21.3694882551829,
7828            23.657343721389772,
7829            28.178246479034424,
7830            27.75535701115926,
7831            30.86851720128741,
7832            34.388632106781,
7833        ],
7834    },
7835    "NVIDIA GeForce RTX 4090 D": {
7836        "digits": 11,
7837        "average_time": [
7838            12.535813426971435,
7839            13.367040371894836,
7840            14.397390270233155,
7841            15.773727321624756,
7842            16.52033654212952,
7843            18.87070236206055,
7844            20.572682762145995,
7845            22.169760519266127,
7846        ],
7847    },
7848    "NVIDIA GeForce RTX 4090": {
7849        "digits": 11,
7850        "average_time": [
7851            11.02204384803772,
7852            11.871551060676575,
7853            12.621799103418986,
7854            13.46524715423584,
7855            14.425264406204224,
7856            12.915648317337036,
7857            16.706109033312117,
7858            17.858580154180526,
7859        ],
7860    },
7861    "NVIDIA H200": {
7862        "digits": 11,
7863        "average_time": [
7864            13.78846188,
7865            13.20821786,
7866            14.69337816,
7867            17.86422935,
7868            19.47975516,
7869            21.64789315,
7870            22.24125861,
7871            25.43047319,
7872        ],
7873    },
7874}
7875
7876LIB_NVIDIA_ML_DIGESTS = {
7877    "535.183.01": "58fc46eefa8ebb265293556951a75a39",
7878    "535.183.06": "03ed7fa2134095b32f9d0d24a774c6ba",
7879    "535.216.01": "96479a06139fc5261d06f432970d6a7b",
7880    "535.216.03": "189634bf960b9a2efe1af8011d27ccf7",
7881    "535.230.02": "cc34ae85c2238b9a49067e683c1998cf",
7882    "545.23.06": "5ad33588e91af67139efb54fe9fefc68",
7883    "545.29.06": "85ad949d7553ab96cce5c811e229c7c7",
7884    "550.120": "48be49d0e792b5ee76f73857c0bef35a",
7885    "550.127.05": "bfa2733eee442016792bcbf130156e3d",
7886    "550.54.15": "9625642dcf8765f52e332c8e38fbef73",
7887    "550.78": "1f335d1f068931fe7f2ce13117d1602b",
7888    "550.90.07": "c95828f8a8ab7f17743b40561b812c96",
7889    "550.90.12": "d7702d394ab213a725abeb345185a072",
7890    "555.42.02": "0262f396e80847dccefc8ccf52cff1ae",
7891    "555.42.06": "69774adffa76471490e6d8fac9067725",
7892    "560.28.03": "6d6e0122cff1ac777a9e37ba09b886cb",
7893    "560.35.03": "93a3f8ef77af86b79314c00b0788aeed",
7894    "560.35.05": "1eec299b50e33a6cfa5155ded53495ab",
7895    "565.57.01": "c801dd3fc4660f3a8ddf977cfdffe113",
7896    "550.127.08": "ac925f2cd192ad971c5466d55945a243",
7897    "550.142": "e68b535a61be6434fc7f12450561a3d0"
7898}
7899
7900DOCKER_DIGESTS = {
7901    "26.1.3": "52d8fcc2c4370bf324cdf17cbc586784",
7902    "27.3.1": "40f1f7724fa0432ea6878692a05b998c",
7903}
7904
7905
7906
7907---
7908File: /neurons/validators/src/services/docker_service.py
7909---
7910
7911import asyncio
7912import logging
7913import time
7914from typing import Annotated
7915from uuid import uuid4
7916
7917import aiohttp
7918import asyncssh
7919import bittensor
7920from datura.requests.miner_requests import ExecutorSSHInfo
7921from fastapi import Depends
7922from payload_models.payloads import (
7923    ContainerCreatedResult,
7924    ContainerCreateRequest,
7925    ContainerDeleteRequest,
7926    ContainerStartRequest,
7927    ContainerStopRequest,
7928    FailedContainerErrorCodes,
7929    FailedContainerRequest,
7930)
7931from protocol.vc_protocol.compute_requests import RentedMachine
7932
7933from core.utils import _m, get_extra_info
7934from services.redis_service import (
7935    AVAILABLE_PORT_MAPS_PREFIX,
7936    STREAMING_LOG_CHANNEL,
7937    RedisService,
7938)
7939from services.ssh_service import SSHService
7940
7941logger = logging.getLogger(__name__)
7942
7943REPOSITORIES = [
7944    "daturaai/compute-subnet-executor:latest",
7945    "daturaai/compute-subnet-executor-runner:latest",
7946    "containrrr/watchtower:1.7.1",
7947    "daturaai/pytorch",
7948    "daturaai/ubuntu",
7949]
7950
7951LOG_STREAM_INTERVAL = 5  # 5 seconds
7952
7953
7954class DockerService:
7955    def __init__(
7956        self,
7957        ssh_service: Annotated[SSHService, Depends(SSHService)],
7958        redis_service: Annotated[RedisService, Depends(RedisService)],
7959    ):
7960        self.ssh_service = ssh_service
7961        self.redis_service = redis_service
7962        self.lock = asyncio.Lock()
7963        self.logs_queue: list[dict] = []
7964        self.log_task: asyncio.Task | None = None
7965        self.is_realtime_logging = False
7966
7967    async def generate_portMappings(self, miner_hotkey, executor_id, internal_ports=None):
7968        try:
7969            docker_internal_ports = [22, 20000, 20001, 20002, 20003]
7970            if internal_ports:
7971                docker_internal_ports = internal_ports
7972
7973            key = f"{AVAILABLE_PORT_MAPS_PREFIX}:{miner_hotkey}:{executor_id}"
7974            available_port_maps = await self.redis_service.lrange(key)
7975
7976            logger.info(f"available_port_maps: {key}, {available_port_maps}")
7977
7978            mappings = []
7979            for i, docker_port in enumerate(docker_internal_ports):
7980                if i < len(available_port_maps):
7981                    internal_port, external_port = map(
7982                        int, available_port_maps[i].decode().split(",")
7983                    )
7984                    mappings.append((docker_port, internal_port, external_port))
7985                else:
7986                    break
7987            return mappings
7988        except Exception as e:
7989            logger.error(f"Error generating port mappings: {e}", exc_info=True)
7990            return []
7991
7992    async def execute_and_stream_logs(
7993        self,
7994        ssh_client: asyncssh.SSHClientConnection,
7995        command: str,
7996        log_tag: str,
7997    ):
7998        status = True
7999        error = ''
8000        async with ssh_client.create_process(command) as process:
8001            async for line in process.stdout:
8002                async with self.lock:
8003                    self.logs_queue.append(
8004                        {
8005                            "log_text": line.strip(),
8006                            "log_status": "success",
8007                            "log_tag": log_tag,
8008                        }
8009                    )
8010
8011            async for line in process.stderr:
8012                status = False
8013                error += line
8014                async with self.lock:
8015                    self.logs_queue.append(
8016                        {
8017                            "log_text": line.strip(),
8018                            "log_status": "error",
8019                            "log_tag": log_tag,
8020                        }
8021                    )
8022
8023        return status, error
8024
8025    async def handle_stream_logs(
8026        self,
8027        miner_hotkey,
8028        executor_id,
8029    ):
8030        default_extra = {
8031            "miner_hotkey": miner_hotkey,
8032            "executor_uuid": executor_id,
8033        }
8034
8035        self.is_realtime_logging = True
8036
8037        while True:
8038            await asyncio.sleep(LOG_STREAM_INTERVAL)
8039
8040            async with self.lock:
8041                logs_to_process = self.logs_queue[:]
8042                self.logs_queue.clear()
8043
8044            if logs_to_process:
8045                try:
8046                    await self.redis_service.publish(
8047                        STREAMING_LOG_CHANNEL,
8048                        {
8049                            "logs": logs_to_process,
8050                            "miner_hotkey": miner_hotkey,
8051                            "executor_uuid": executor_id,
8052                        },
8053                    )
8054
8055                    logger.info(
8056                        _m(
8057                            f"Successfully published {len(logs_to_process)} logs",
8058                            extra=get_extra_info(default_extra),
8059                        )
8060                    )
8061
8062                except Exception as e:
8063                    logger.error(
8064                        _m(
8065                            "Error publishing log stream",
8066                            extra=get_extra_info({**default_extra, "error": str(e)}),
8067                        ),
8068                        exc_info=True,
8069                    )
8070
8071            if not self.is_realtime_logging:
8072                break
8073
8074        logger.info(
8075            _m(
8076                "Exit handle_stream_logs",
8077                extra=get_extra_info(default_extra),
8078            )
8079        )
8080
8081    async def finish_stream_logs(self):
8082        self.is_realtime_logging = False
8083        if self.log_task:
8084            await self.log_task
8085
8086    async def check_container_running(
8087        self, ssh_client: asyncssh.SSHClientConnection, container_name: str, timeout: int = 10
8088    ):
8089        """Check if the container is running"""
8090        start_time = time.time()
8091        while time.time() - start_time < timeout:
8092            result = await ssh_client.run(f"docker ps -q -f name={container_name}")
8093            if result.stdout.strip():
8094                return True
8095            await asyncio.sleep(1)
8096        return False
8097
8098    async def clean_exisiting_containers(
8099        self,
8100        ssh_client: asyncssh.SSHClientConnection,
8101        default_extra: dict,
8102    ):
8103        command = 'docker ps -a --filter "name=^/container_" --format "{{.ID}}"'
8104        result = await ssh_client.run(command)
8105        if result.stdout.strip():
8106            ids = " ".join(result.stdout.strip().split("\n"))
8107
8108            logger.info(
8109                _m(
8110                    "Cleaning existing docker containers",
8111                    extra=get_extra_info({
8112                        **default_extra,
8113                        "command": command,
8114                        "ids": ids,
8115                    }),
8116                ),
8117            )
8118
8119            command = f'docker rm {ids} -f'
8120            await ssh_client.run(command)
8121
8122            command = f'docker volume prune -af'
8123            await ssh_client.run(command)
8124
8125    async def clear_verified_job_count(self, executor_info: ExecutorSSHInfo):
8126        await self.redis_service.set_verified_job_count(executor_info.uuid, 0)
8127
8128    async def create_container(
8129        self,
8130        payload: ContainerCreateRequest,
8131        executor_info: ExecutorSSHInfo,
8132        keypair: bittensor.Keypair,
8133        private_key: str,
8134    ):
8135        default_extra = {
8136            "miner_hotkey": payload.miner_hotkey,
8137            "executor_uuid": payload.executor_id,
8138            "executor_ip_address": executor_info.address,
8139            "executor_port": executor_info.port,
8140            "executor_ssh_username": executor_info.ssh_username,
8141            "executor_ssh_port": executor_info.ssh_port,
8142            "docker_image": payload.docker_image,
8143            "debug": payload.debug,
8144        }
8145
8146        logger.info(
8147            _m(
8148                "Create Docker Container",
8149                extra=get_extra_info({**default_extra, "payload": str(payload)}),
8150            ),
8151        )
8152
8153        log_tag = "container_creation"
8154        custom_options = payload.custom_options
8155
8156        try:
8157            # generate port maps
8158            if custom_options and custom_options.internal_ports:
8159                port_maps = await self.generate_portMappings(
8160                    payload.miner_hotkey, payload.executor_id, custom_options.internal_ports
8161                )
8162            else:
8163                port_maps = await self.generate_portMappings(
8164                    payload.miner_hotkey, payload.executor_id
8165                )
8166
8167            if not port_maps:
8168                log_text = "No port mappings found"
8169                logger.error(log_text)
8170
8171                await self.clear_verified_job_count(executor_info)
8172
8173                return FailedContainerRequest(
8174                    miner_hotkey=payload.miner_hotkey,
8175                    executor_id=payload.executor_id,
8176                    msg=str(log_text),
8177                    error_code=FailedContainerErrorCodes.NoPortMappings,
8178                )
8179
8180            private_key = self.ssh_service.decrypt_payload(keypair.ss58_address, private_key)
8181            pkey = asyncssh.import_private_key(private_key)
8182
8183            async with asyncssh.connect(
8184                host=executor_info.address,
8185                port=executor_info.ssh_port,
8186                username=executor_info.ssh_username,
8187                client_keys=[pkey],
8188                known_hosts=None,
8189            ) as ssh_client:
8190                await self.clean_exisiting_containers(ssh_client=ssh_client, default_extra=default_extra)
8191
8192                logger.info(
8193                    _m(
8194                        "Pulling docker image",
8195                        extra=get_extra_info({
8196                            **default_extra,
8197                            "docker_image": payload.docker_image
8198                        }),
8199                    ),
8200                )
8201
8202                # set real-time logging
8203                self.log_task = asyncio.create_task(
8204                    self.handle_stream_logs(
8205                        miner_hotkey=payload.miner_hotkey,
8206                        executor_id=payload.executor_id,
8207                    )
8208                )
8209
8210                async with self.lock:
8211                    self.logs_queue.append(
8212                        {
8213                            "log_text": f"Pulling docker image {payload.docker_image}",
8214                            "log_status": "success",
8215                            "log_tag": log_tag,
8216                        }
8217                    )
8218
8219                command = f"docker pull {payload.docker_image}"
8220                status, error = await self.execute_and_stream_logs(
8221                    ssh_client=ssh_client,
8222                    command=command,
8223                    log_tag=log_tag,
8224                )
8225                if not status:
8226                    log_text = _m(
8227                        "Docker pull failed",
8228                        extra=get_extra_info({
8229                            **default_extra,
8230                            "error": error,
8231                        }),
8232                    )
8233                    logger.error(log_text)
8234
8235                    await self.finish_stream_logs()
8236                    await self.clear_verified_job_count(executor_info)
8237
8238                    return FailedContainerRequest(
8239                        miner_hotkey=payload.miner_hotkey,
8240                        executor_id=payload.executor_id,
8241                        msg=str(log_text),
8242                        error_code=FailedContainerErrorCodes.UnknownError,
8243                    )
8244
8245                port_flags = " ".join(
8246                    [
8247                        f"-p {internal_port}:{docker_port}"
8248                        for docker_port, internal_port, _ in port_maps
8249                    ]
8250                )
8251
8252                # Prepare extra options
8253                sanitized_volumes = [
8254                    volume for volume
8255                    in (custom_options.volumes if custom_options and custom_options.volumes else [])
8256                    if volume.strip()
8257                ]
8258                volume_flags = (
8259                    " ".join([f"-v {volume}" for volume in sanitized_volumes])
8260                    if sanitized_volumes
8261                    else ""
8262                )
8263                entrypoint_flag = (
8264                    f"--entrypoint {custom_options.entrypoint}"
8265                    if custom_options
8266                    and custom_options.entrypoint
8267                    and custom_options.entrypoint.strip()
8268                    else ""
8269                )
8270                env_flags = (
8271                    " ".join(
8272                        [
8273                            f"-e {key}={value}"
8274                            for key, value in custom_options.environment.items()
8275                            if key and value and key.strip() and value.strip()
8276                        ]
8277                    )
8278                    if custom_options and custom_options.environment
8279                    else ""
8280                )
8281                startup_commands = (
8282                    f"{custom_options.startup_commands}"
8283                    if custom_options
8284                    and custom_options.startup_commands
8285                    and custom_options.startup_commands.strip()
8286                    else ""
8287                )
8288
8289                uuid = uuid4()
8290
8291                # creat docker volume
8292                async with self.lock:
8293                    self.logs_queue.append(
8294                        {
8295                            "log_text": "Creating docker volume",
8296                            "log_status": "success",
8297                            "log_tag": log_tag,
8298                        }
8299                    )
8300
8301                volume_name = f"volume_{uuid}"
8302                command = f"docker volume create {volume_name}"
8303                status, error = await self.execute_and_stream_logs(
8304                    ssh_client=ssh_client, command=command, log_tag="container_creation"
8305                )
8306                if not status:
8307                    log_text = _m(
8308                        "Docker volume creation failed",
8309                        extra=get_extra_info({
8310                            **default_extra,
8311                            "error": error
8312                        }),
8313                    )
8314                    logger.error(log_text)
8315
8316                    await self.finish_stream_logs()
8317                    await self.clear_verified_job_count(executor_info)
8318
8319                    return FailedContainerRequest(
8320                        miner_hotkey=payload.miner_hotkey,
8321                        executor_id=payload.executor_id,
8322                        msg=str(log_text),
8323                        error_code=FailedContainerErrorCodes.UnknownError,
8324                    )
8325
8326                logger.info(
8327                    _m(
8328                        "Created Docker Volume",
8329                        extra=get_extra_info({**default_extra, "volume_name": volume_name}),
8330                    ),
8331                )
8332
8333                # create docker container with the port map & resource
8334                async with self.lock:
8335                    self.logs_queue.append(
8336                        {
8337                            "log_text": "Creating docker container",
8338                            "log_status": "success",
8339                            "log_tag": log_tag,
8340                        }
8341                    )
8342
8343                container_name = f"container_{uuid}"
8344
8345                if payload.debug:
8346                    command = f'docker run -d {port_flags} -v "/var/run/docker.sock:/var/run/docker.sock" {volume_flags} {entrypoint_flag} -e PUBLIC_KEY="{payload.user_public_key}" {env_flags} --mount source={volume_name},target=/root --name {container_name} {payload.docker_image} {startup_commands}'
8347                else:
8348                    command = f'docker run -d {port_flags} {volume_flags} {entrypoint_flag} -e PUBLIC_KEY="{payload.user_public_key}" {env_flags} --mount source={volume_name},target=/root --gpus all --name {container_name}  {payload.docker_image} {startup_commands}'
8349
8350                logger.info(
8351                    _m(
8352                        "Creating docker container",
8353                        extra=get_extra_info({
8354                            **default_extra,
8355                            "command": command,
8356                        }),
8357                    ),
8358                )
8359
8360                status, error = await self.execute_and_stream_logs(
8361                    ssh_client=ssh_client, command=command, log_tag="container_creation"
8362                )
8363                if not status:
8364                    log_text = _m(
8365                        "Docker container creation failed",
8366                        extra=get_extra_info({
8367                            **default_extra,
8368                            "command": command,
8369                            "error": error,
8370                        }),
8371                    )
8372                    logger.error(log_text)
8373
8374                    await self.finish_stream_logs()
8375                    await self.clear_verified_job_count(executor_info)
8376
8377                    return FailedContainerRequest(
8378                        miner_hotkey=payload.miner_hotkey,
8379                        executor_id=payload.executor_id,
8380                        msg=str(log_text),
8381                        error_code=FailedContainerErrorCodes.UnknownError,
8382                    )
8383
8384                # check if the container is running correctly
8385                if not await self.check_container_running(ssh_client, container_name):
8386                    log_text = _m(
8387                        "Run docker run command but container is not running",
8388                        extra=get_extra_info({
8389                            **default_extra,
8390                            "container_name": container_name,
8391                        }),
8392                    )
8393                    logger.error(log_text)
8394
8395                    await self.finish_stream_logs()
8396                    await self.clear_verified_job_count(executor_info)
8397
8398                    return FailedContainerRequest(
8399                        miner_hotkey=payload.miner_hotkey,
8400                        executor_id=payload.executor_id,
8401                        msg=str(log_text),
8402                        error_code=FailedContainerErrorCodes.ContainerNotRunning,
8403                    )
8404
8405                logger.info(
8406                    _m(
8407                        "Created Docker Container",
8408                        extra=get_extra_info({**default_extra, "container_name": container_name}),
8409                    ),
8410                )
8411
8412                await self.finish_stream_logs()
8413
8414                await self.redis_service.add_rented_machine(
8415                    RentedMachine(
8416                        miner_hotkey=payload.miner_hotkey,
8417                        executor_id=payload.executor_id,
8418                        executor_ip_address=executor_info.address,
8419                        executor_ip_port=str(executor_info.port),
8420                    )
8421                )
8422
8423                return ContainerCreatedResult(
8424                    container_name=container_name,
8425                    volume_name=volume_name,
8426                    port_maps=[
8427                        (docker_port, external_port) for docker_port, _, external_port in port_maps
8428                    ],
8429                )
8430        except Exception as e:
8431            log_text = _m(
8432                "Unknown Error create_container",
8433                extra=get_extra_info({**default_extra, "error": str(e)}),
8434            )
8435            logger.error(log_text, exc_info=True)
8436
8437            await self.finish_stream_logs()
8438            await self.clear_verified_job_count(executor_info)
8439
8440            return FailedContainerRequest(
8441                miner_hotkey=payload.miner_hotkey,
8442                executor_id=payload.executor_id,
8443                msg=str(log_text),
8444                error_code=FailedContainerErrorCodes.UnknownError,
8445            )
8446
8447    async def stop_container(
8448        self,
8449        payload: ContainerStopRequest,
8450        executor_info: ExecutorSSHInfo,
8451        keypair: bittensor.Keypair,
8452        private_key: str,
8453    ):
8454        default_extra = {
8455            "miner_hotkey": payload.miner_hotkey,
8456            "executor_uuid": payload.executor_id,
8457            "executor_ip_address": executor_info.address,
8458            "executor_port": executor_info.port,
8459            "executor_ssh_username": executor_info.ssh_username,
8460            "executor_ssh_port": executor_info.ssh_port,
8461        }
8462
8463        logger.info(
8464            _m(
8465                "Stop Docker Container", extra=get_extra_info({**default_extra, "payload": str(payload)})
8466            ),
8467        )
8468
8469        private_key = self.ssh_service.decrypt_payload(keypair.ss58_address, private_key)
8470        pkey = asyncssh.import_private_key(private_key)
8471
8472        async with asyncssh.connect(
8473            host=executor_info.address,
8474            port=executor_info.ssh_port,
8475            username=executor_info.ssh_username,
8476            client_keys=[pkey],
8477            known_hosts=None,
8478        ) as ssh_client:
8479            await ssh_client.run(f"docker stop {payload.container_name}")
8480
8481            logger.info(
8482                _m(
8483                    "Stopped Docker Container",
8484                    extra=get_extra_info(
8485                        {**default_extra, "container_name": payload.container_name}
8486                    ),
8487                ),
8488            )
8489
8490    async def start_container(
8491        self,
8492        payload: ContainerStartRequest,
8493        executor_info: ExecutorSSHInfo,
8494        keypair: bittensor.Keypair,
8495        private_key: str,
8496    ):
8497        default_extra = {
8498            "miner_hotkey": payload.miner_hotkey,
8499            "executor_uuid": payload.executor_id,
8500            "executor_ip_address": executor_info.address,
8501            "executor_port": executor_info.port,
8502            "executor_ssh_username": executor_info.ssh_username,
8503            "executor_ssh_port": executor_info.ssh_port,
8504        }
8505
8506        logger.info(
8507            _m(
8508                "Restart Docker Container",
8509                extra=get_extra_info({**default_extra, "payload": str(payload)}),
8510            ),
8511        )
8512
8513        private_key = self.ssh_service.decrypt_payload(keypair.ss58_address, private_key)
8514        pkey = asyncssh.import_private_key(private_key)
8515
8516        async with asyncssh.connect(
8517            host=executor_info.address,
8518            port=executor_info.ssh_port,
8519            username=executor_info.ssh_username,
8520            client_keys=[pkey],
8521            known_hosts=None,
8522        ) as ssh_client:
8523            await ssh_client.run(f"docker start {payload.container_name}")
8524            logger.info(
8525                _m(
8526                    "Started Docker Container",
8527                    extra=get_extra_info(
8528                        {**default_extra, "container_name": payload.container_name}
8529                    ),
8530                ),
8531            )
8532
8533    async def delete_container(
8534        self,
8535        payload: ContainerDeleteRequest,
8536        executor_info: ExecutorSSHInfo,
8537        keypair: bittensor.Keypair,
8538        private_key: str,
8539    ):
8540        default_extra = {
8541            "miner_hotkey": payload.miner_hotkey,
8542            "executor_uuid": payload.executor_id,
8543            "executor_ip_address": executor_info.address,
8544            "executor_port": executor_info.port,
8545            "executor_ssh_username": executor_info.ssh_username,
8546            "executor_ssh_port": executor_info.ssh_port,
8547        }
8548
8549        logger.info(
8550            _m(
8551                "Delete Docker Container",
8552                extra=get_extra_info({**default_extra, "payload": str(payload)}),
8553            ),
8554        )
8555
8556        private_key = self.ssh_service.decrypt_payload(keypair.ss58_address, private_key)
8557        pkey = asyncssh.import_private_key(private_key)
8558
8559        async with asyncssh.connect(
8560            host=executor_info.address,
8561            port=executor_info.ssh_port,
8562            username=executor_info.ssh_username,
8563            client_keys=[pkey],
8564            known_hosts=None,
8565        ) as ssh_client:
8566            # await ssh_client.run(f"docker stop {payload.container_name}")
8567            await ssh_client.run(f"docker rm {payload.container_name} -f")
8568            await ssh_client.run(f"docker volume rm {payload.volume_name} -f")
8569
8570            logger.info(
8571                _m(
8572                    "Deleted Docker Container",
8573                    extra=get_extra_info(
8574                        {
8575                            **default_extra,
8576                            "container_name": payload.container_name,
8577                            "volume_name": payload.volume_name,
8578                        }
8579                    ),
8580                ),
8581            )
8582
8583            await self.redis_service.remove_rented_machine(
8584                RentedMachine(
8585                    miner_hotkey=payload.miner_hotkey,
8586                    executor_id=payload.executor_id,
8587                    executor_ip_address=executor_info.address,
8588                    executor_ip_port=str(executor_info.port),
8589                )
8590            )
8591
8592    async def get_docker_hub_digests(self, repositories) -> dict[str, str]:
8593        """Retrieve all tags and their corresponding digests from Docker Hub."""
8594        all_digests = {}  # Initialize a dictionary to store all tag-digest pairs
8595
8596        async with aiohttp.ClientSession() as session:
8597            for repo in repositories:
8598                try:
8599                    # Split repository and tag if specified
8600                    if ":" in repo:
8601                        repository, specified_tag = repo.split(":", 1)
8602                    else:
8603                        repository, specified_tag = repo, None
8604
8605                    # Get authorization token
8606                    async with session.get(
8607                        f"https://auth.docker.io/token?service=registry.docker.io&scope=repository:{repository}:pull"
8608                    ) as token_response:
8609                        token_response.raise_for_status()
8610                        token = await token_response.json()
8611                        token = token.get("token")
8612
8613                    # Find all tags if no specific tag is specified
8614                    if specified_tag is None:
8615                        async with session.get(
8616                            f"https://index.docker.io/v2/{repository}/tags/list",
8617                            headers={"Authorization": f"Bearer {token}"},
8618                        ) as tags_response:
8619                            tags_response.raise_for_status()
8620                            tags_data = await tags_response.json()
8621                            all_tags = tags_data.get("tags", [])
8622                    else:
8623                        all_tags = [specified_tag]
8624
8625                    # Dictionary to store tag-digest pairs for the current repository
8626                    tag_digests = {}
8627                    for tag in all_tags:
8628                        # Get image digest
8629                        async with session.head(
8630                            f"https://index.docker.io/v2/{repository}/manifests/{tag}",
8631                            headers={
8632                                "Authorization": f"Bearer {token}",
8633                                "Accept": "application/vnd.docker.distribution.manifest.v2+json",
8634                            },
8635                        ) as manifest_response:
8636                            manifest_response.raise_for_status()
8637                            digest = manifest_response.headers.get("Docker-Content-Digest")
8638                            tag_digests[f"{repository}:{tag}"] = digest
8639
8640                    # Update the all_digests dictionary with the current repository's tag-digest pairs
8641                    all_digests.update(tag_digests)
8642
8643                except aiohttp.ClientError as e:
8644                    print(f"Error retrieving data for {repo}: {e}")
8645
8646        return all_digests
8647
8648    async def setup_ssh_access(
8649        self,
8650        ssh_client: asyncssh.SSHClientConnection,
8651        container_name: str,
8652        ip_address: str,
8653        username: str = "root",
8654        port_maps: list[tuple[int, int]] = None,
8655    ) -> tuple[bool, str, str]:
8656        """Generate an SSH key pair, add the public key to the Docker container, and check SSH connection."""
8657
8658        my_key = "my_key"
8659        private_key, public_key = self.ssh_service.generate_ssh_key(my_key)
8660
8661        public_key = public_key.decode("utf-8")
8662        private_key = private_key.decode("utf-8")
8663
8664        private_key = self.ssh_service.decrypt_payload(my_key, private_key)
8665        pkey = asyncssh.import_private_key(private_key)
8666
8667        await asyncio.sleep(5)
8668
8669        command = f"docker exec {container_name} sh -c 'echo \"{public_key}\" >> /root/.ssh/authorized_keys'"
8670
8671        result = await ssh_client.run(command)
8672        if result.exit_status != 0:
8673            log_text = "Error creating docker connection"
8674            log_status = "error"
8675            logger.error(log_text)
8676
8677            return False, log_text, log_status
8678
8679        port = 0
8680        for internal, external in port_maps:
8681            if internal == 22:
8682                port = external
8683        # Check SSH connection
8684        try:
8685            async with asyncssh.connect(
8686                host=ip_address,
8687                port=port,
8688                username=username,
8689                client_keys=[pkey],
8690                known_hosts=None,
8691            ):
8692                log_status = "info"
8693                log_text = "SSH connection successful!"
8694                logger.info(
8695                    _m(
8696                        log_text,
8697                        extra={
8698                            "container_name": container_name,
8699                            "ip_address": ip_address,
8700                            "port_maps": port_maps,
8701                        },
8702                    )
8703                )
8704                return True, log_text, log_status
8705        except Exception as e:
8706            log_text = "SSH connection failed"
8707            log_status = "error"
8708            logger.error(
8709                _m(
8710                    log_text,
8711                    extra={
8712                        "container_name": container_name,
8713                        "ip_address": ip_address,
8714                        "port_maps": port_maps,
8715                        "error": str(e),
8716                    },
8717                )
8718            )
8719            return False, log_text, log_status
8720
8721
8722
8723---
8724File: /neurons/validators/src/services/file_encrypt_service.py
8725---
8726
8727import os
8728import random
8729import subprocess
8730from typing import Annotated
8731from pathlib import Path
8732import tempfile
8733import shutil
8734import PyInstaller.__main__
8735from fastapi import Depends
8736
8737from services.ssh_service import SSHService
8738
8739from payload_models.payloads import MinerJobEnryptedFiles
8740
8741
8742class FileEncryptService:
8743    def __init__(
8744        self,
8745        ssh_service: Annotated[SSHService, Depends(SSHService)],
8746    ):
8747        self.ssh_service = ssh_service
8748
8749    def make_obfuscated_file(self, tmp_directory: str, file_path: str):
8750        subprocess.run(
8751            ['pyarmor', 'gen', '-O', tmp_directory, file_path],
8752            stdout=subprocess.PIPE,
8753            stderr=subprocess.PIPE,
8754        )
8755        return os.path.basename(file_path)
8756
8757    def make_binary_file(self, tmp_directory: str, file_path: str):
8758        file_name = os.path.basename(file_path)
8759
8760        PyInstaller.__main__.run([
8761            file_path,
8762            '--onefile',
8763            '--noconsole',
8764            '--log-level=ERROR',
8765            '--distpath', tmp_directory,
8766            '--name', file_name,
8767        ])
8768
8769        subprocess.run(['rm', '-rf', 'build', f'{file_name}.spec'])
8770
8771        return file_name
8772
8773    def make_binary_file_with_nuitka(self, tmp_directory: str, file_path: str):
8774        file_name = os.path.basename(file_path)
8775
8776        subprocess.run([
8777            'nuitka', '--standalone', '--onefile',
8778            f'--output-dir={tmp_directory}',
8779            '--remove-output', '--quiet', '--no-progress',
8780            f'--output-filename={file_name}',
8781            file_path
8782        ])
8783
8784        return file_name
8785
8786    def ecrypt_miner_job_files(self):
8787        tmp_directory = Path(__file__).parent / "temp"
8788        if tmp_directory.exists() and tmp_directory.is_dir():
8789            shutil.rmtree(tmp_directory)
8790
8791        string_count = random.randint(10, 100)
8792        encrypt_key = self.ssh_service.generate_random_string(string_count)
8793
8794        machine_scrape_file_path = str(
8795            Path(__file__).parent / ".." / "miner_jobs/machine_scrape.py"
8796        )
8797        with open(machine_scrape_file_path, 'r') as file:
8798            content = file.read()
8799        modified_content = content.replace('encrypt_key', encrypt_key)
8800
8801        with tempfile.NamedTemporaryFile(delete=True) as machine_scrape_file:
8802            machine_scrape_file.write(modified_content.encode('utf-8'))
8803            machine_scrape_file.flush()
8804            os.fsync(machine_scrape_file.fileno())
8805            machine_scrape_file_name = self.make_binary_file_with_nuitka(str(tmp_directory), machine_scrape_file.name)
8806
8807        # generate score_script file
8808        score_script_file_path = str(Path(__file__).parent / ".." / "miner_jobs/score.py")
8809        with open(score_script_file_path, 'r') as file:
8810            content = file.read()
8811        modified_content = content.replace('encrypt_key', encrypt_key)
8812
8813        with tempfile.NamedTemporaryFile(delete=True, suffix='.py') as score_file:
8814            score_file.write(modified_content.encode('utf-8'))
8815            score_file.flush()
8816            os.fsync(score_file.fileno())
8817            score_file_name = self.make_obfuscated_file(str(tmp_directory), score_file.name)
8818
8819        return MinerJobEnryptedFiles(
8820            encrypt_key=encrypt_key,
8821            tmp_directory=str(tmp_directory),
8822            machine_scrape_file_name=machine_scrape_file_name,
8823            score_file_name=score_file_name,
8824        )
8825
8826
8827
8828---
8829File: /neurons/validators/src/services/hash_service.py
8830---
8831
8832import enum
8833import hashlib
8834import string
8835import random
8836import secrets
8837from dataclasses import dataclass
8838from base64 import b64encode
8839import json
8840from typing import Self
8841import subprocess
8842
8843
8844class Algorithm(enum.Enum):
8845    SHA256 = "SHA256"
8846    SHA384 = "SHA384"
8847    SHA512 = "SHA512"
8848
8849    @property
8850    def params(self):
8851        return {
8852            Algorithm.SHA256: {
8853                "hash_function": hashlib.sha256,
8854                "hash_type": "1410",
8855            },
8856            Algorithm.SHA384: {
8857                "hash_function": hashlib.sha384,
8858                "hash_type": "10810",
8859            },
8860            Algorithm.SHA512: {
8861                "hash_function": hashlib.sha512,
8862                "hash_type": "1710",
8863            },
8864        }
8865
8866    def hash(self, *args, **kwargs):
8867        return self.params[self]["hash_function"](*args, **kwargs)
8868
8869    @property
8870    def type(self):
8871        return self.params[self]["hash_type"]
8872
8873
8874@dataclass
8875class JobParam:
8876    algorithm: Algorithm
8877    num_letters: int
8878    num_digits: int
8879    num_hashes: int
8880
8881    @classmethod
8882    def generate(
8883        cls,
8884        num_letters: int,
8885        num_digits: int,
8886        num_hashes: int,
8887    ) -> Self:
8888        algorithm = random.choice(list(Algorithm))
8889
8890        return cls(
8891            algorithm=algorithm,
8892            num_letters=num_letters,
8893            num_digits=num_digits,
8894            num_hashes=num_hashes,
8895        )
8896
8897    @property
8898    def password_length(self) -> int:
8899        return self.num_letters + self.num_digits
8900
8901    def __str__(self) -> str:
8902        return (
8903            f"algorithm={self.algorithm} "
8904            f"algorithm_type={self.algorithm.type} "
8905            f"num_letters={self.num_letters} "
8906            f"num_digits={self.num_digits} "
8907            f"num_hashes={self.num_hashes}"
8908        )
8909
8910
8911@dataclass
8912class HashcatJob:
8913    passwords: list[list[str]]
8914    salts: list[bytes]
8915    job_params: list[JobParam]
8916
8917
8918@dataclass
8919class HashService:
8920    gpu_count: int
8921    num_job_params: int
8922    jobs: list[HashcatJob]
8923    timeout: int
8924
8925    @classmethod
8926    def random_string(self, num_letters: int, num_digits: int) -> str:
8927        return ''.join(random.choices(string.ascii_letters, k=num_letters)) + ''.join(random.choices(string.digits, k=num_digits))
8928
8929    @classmethod
8930    def generate(
8931        cls,
8932        gpu_count: int = 1,
8933        timeout: int = 60,
8934        num_job_params: int = 1,
8935        num_letters: int = 0,
8936        num_digits: int = 11,
8937        num_hashes: int = 10,
8938        salt_length_bytes: int = 8
8939    ) -> Self:
8940        jobs = []
8941        for _ in range(gpu_count):
8942            job_params = [
8943                JobParam.generate(
8944                    num_letters=num_letters,
8945                    num_digits=num_digits,
8946                    num_hashes=num_hashes,
8947                )
8948                for _ in range(num_job_params)
8949            ]
8950
8951            passwords = [
8952                sorted(
8953                    {
8954                        cls.random_string(
8955                            num_letters=_params.num_letters, num_digits=_params.num_digits
8956                        )
8957                        for _ in range(_params.num_hashes)
8958                    }
8959                )
8960                for _params in job_params
8961            ]
8962
8963            salts = [secrets.token_bytes(salt_length_bytes) for _ in range(num_job_params)]
8964
8965            jobs.append(HashcatJob(
8966                job_params=job_params,
8967                passwords=passwords,
8968                salts=salts,
8969            ))
8970
8971        return cls(
8972            gpu_count=gpu_count,
8973            num_job_params=num_job_params,
8974            jobs=jobs,
8975            timeout=timeout,
8976        )
8977
8978    def hash_masks(self, job: HashcatJob) -> list[str]:
8979        return ["?1" * param.num_letters + "?d" * param.num_digits for param in job.job_params]
8980
8981    def hash_hexes(self, algorithm: Algorithm, passwords: list[str], salt: str) -> list[str]:
8982        return [
8983            algorithm.hash(password.encode("ascii") + salt).hexdigest()
8984            for password in passwords
8985        ]
8986
8987    def _hash(self, s: bytes) -> bytes:
8988        return b64encode(hashlib.sha256(s).digest(), altchars=b"-_")
8989
8990    # def _payload(self, i) -> str:
8991    #     return "\n".join([f"{hash_hex}:{self.salts[i].hex()}" for hash_hex in self.hash_hexes(i)])
8992
8993    def _payloads(self, job: HashcatJob) -> list[str]:
8994        payloads = [
8995            "\n".join([
8996                f"{hash_hex}:{job.salts[i].hex()}"
8997                for hash_hex
8998                in self.hash_hexes(job.job_params[i].algorithm, job.passwords[i], job.salts[i])
8999            ])
9000            for i in range(self.num_job_params)
9001        ]
9002        return payloads
9003
9004    @property
9005    def payload(self) -> str | bytes:
9006        """Convert this instance to a hashcat argument format."""
9007
9008        data = {
9009            "gpu_count": self.gpu_count,
9010            "num_job_params": self.num_job_params,
9011            "jobs": [
9012                {
9013                    "payloads": self._payloads(job),
9014                    "masks": self.hash_masks(job),
9015                    "algorithms": [param.algorithm.type for param in job.job_params],
9016                }
9017                for job in self.jobs
9018            ],
9019            "timeout": self.timeout,
9020        }
9021        return json.dumps(data)
9022
9023    @property
9024    def answer(self) -> str:
9025        return self._hash(
9026            "".join(["".join(["".join(passwords) for passwords in job.passwords]) for job in self.jobs]).encode("utf-8")
9027        ).decode("utf-8")
9028
9029    def __str__(self) -> str:
9030        return f"JobService {self.jobs}"
9031
9032
9033if __name__ == "__main__":
9034    import time
9035
9036    hash_service = HashService.generate(gpu_count=1, timeout=50)
9037    # print(hash_service.payload)
9038    print('answer ====>', hash_service.answer)
9039
9040    start_time = time.time()
9041
9042    cmd = f"python src/miner_jobs/score.py '{hash_service.payload}'"
9043    result = subprocess.check_output(cmd, shell=True, text=True, stderr=subprocess.DEVNULL)
9044    end_time = time.time()
9045    print('result ===>', result)
9046    print(end_time - start_time)
9047
9048
9049
9050---
9051File: /neurons/validators/src/services/ioc.py
9052---
9053
9054import asyncio
9055
9056from services.docker_service import DockerService
9057from services.miner_service import MinerService
9058from services.ssh_service import SSHService
9059from services.task_service import TaskService
9060from services.redis_service import RedisService
9061from services.file_encrypt_service import FileEncryptService
9062
9063ioc = {}
9064
9065
9066async def initiate_services():
9067    ioc["SSHService"] = SSHService()
9068    ioc["RedisService"] = RedisService()
9069    ioc["TaskService"] = TaskService(
9070        ssh_service=ioc["SSHService"],
9071        redis_service=ioc["RedisService"]
9072    )
9073    ioc["DockerService"] = DockerService(
9074        ssh_service=ioc["SSHService"],
9075        redis_service=ioc["RedisService"]
9076    )
9077    ioc["MinerService"] = MinerService(
9078        ssh_service=ioc["SSHService"],
9079        task_service=ioc["TaskService"],
9080        redis_service=ioc["RedisService"]
9081    )
9082    ioc["FileEncryptService"] = FileEncryptService(
9083        ssh_service=ioc["SSHService"],
9084    )
9085
9086
9087def sync_initiate():
9088    loop = asyncio.get_event_loop()
9089    loop.run_until_complete(initiate_services())
9090
9091
9092sync_initiate()
9093
9094
9095
9096---
9097File: /neurons/validators/src/services/miner_service.py
9098---
9099
9100import asyncio
9101import json
9102import logging
9103from typing import Annotated
9104
9105import bittensor
9106from clients.miner_client import MinerClient
9107from datura.requests.miner_requests import (
9108    AcceptSSHKeyRequest,
9109    DeclineJobRequest,
9110    ExecutorSSHInfo,
9111    FailedRequest,
9112)
9113from datura.requests.validator_requests import SSHPubKeyRemoveRequest, SSHPubKeySubmitRequest
9114from fastapi import Depends
9115from payload_models.payloads import (
9116    ContainerBaseRequest,
9117    ContainerCreated,
9118    ContainerCreateRequest,
9119    ContainerDeleted,
9120    ContainerDeleteRequest,
9121    ContainerStarted,
9122    ContainerStartRequest,
9123    ContainerStopped,
9124    ContainerStopRequest,
9125    FailedContainerErrorCodes,
9126    FailedContainerRequest,
9127    MinerJobEnryptedFiles,
9128    MinerJobRequestPayload,
9129)
9130from protocol.vc_protocol.compute_requests import RentedMachine
9131
9132from core.config import settings
9133from core.utils import _m, get_extra_info
9134from services.docker_service import DockerService
9135from services.redis_service import EXECUTOR_COUNT_PREFIX, MACHINE_SPEC_CHANNEL_NAME, RedisService
9136from services.ssh_service import SSHService
9137from services.task_service import TaskService
9138
9139logger = logging.getLogger(__name__)
9140
9141
9142JOB_LENGTH = 300
9143
9144
9145class MinerService:
9146    def __init__(
9147        self,
9148        ssh_service: Annotated[SSHService, Depends(SSHService)],
9149        task_service: Annotated[TaskService, Depends(TaskService)],
9150        redis_service: Annotated[RedisService, Depends(RedisService)],
9151    ):
9152        self.ssh_service = ssh_service
9153        self.task_service = task_service
9154        self.redis_service = redis_service
9155
9156    async def request_job_to_miner(
9157        self,
9158        payload: MinerJobRequestPayload,
9159        encypted_files: MinerJobEnryptedFiles,
9160        docker_hub_digests: dict[str, str],
9161        debug=False,
9162    ):
9163        loop = asyncio.get_event_loop()
9164        my_key: bittensor.Keypair = settings.get_bittensor_wallet().get_hotkey()
9165        default_extra = {
9166            "job_batch_id": payload.job_batch_id,
9167            "miner_hotkey": payload.miner_hotkey,
9168            "miner_address": payload.miner_address,
9169            "miner_port": payload.miner_port,
9170        }
9171
9172        try:
9173            logger.info(_m("Requesting job to miner", extra=get_extra_info(default_extra)))
9174            miner_client = MinerClient(
9175                loop=loop,
9176                miner_address=payload.miner_address,
9177                miner_port=payload.miner_port,
9178                miner_hotkey=payload.miner_hotkey,
9179                my_hotkey=my_key.ss58_address,
9180                keypair=my_key,
9181                miner_url=f"ws://{payload.miner_address}:{payload.miner_port}/jobs/{my_key.ss58_address}"
9182            )
9183
9184            async with miner_client:
9185                # generate ssh key and send it to miner
9186                private_key, public_key = self.ssh_service.generate_ssh_key(my_key.ss58_address)
9187
9188                await miner_client.send_model(SSHPubKeySubmitRequest(public_key=public_key))
9189
9190                try:
9191                    msg = await asyncio.wait_for(
9192                        miner_client.job_state.miner_accepted_ssh_key_or_failed_future, JOB_LENGTH
9193                    )
9194                except TimeoutError:
9195                    logger.error(
9196                        _m(
9197                            "Waiting accepted ssh key or failed request from miner resulted in TimeoutError",
9198                            extra=get_extra_info(default_extra),
9199                        ),
9200                    )
9201                    msg = None
9202                except Exception:
9203                    logger.error(
9204                        _m(
9205                            "Waiting accepted ssh key or failed request from miner resulted in an exception",
9206                            extra=get_extra_info(default_extra),
9207                        ),
9208                    )
9209                    msg = None
9210
9211                if isinstance(msg, AcceptSSHKeyRequest):
9212                    logger.info(
9213                        _m(
9214                            "Received AcceptSSHKeyRequest for miner. Running tasks for executors",
9215                            extra=get_extra_info(
9216                                {**default_extra, "executors": len(msg.executors)}
9217                            ),
9218                        ),
9219                    )
9220                    if len(msg.executors) == 0:
9221                        return None
9222
9223                    tasks = [
9224                        asyncio.create_task(
9225                            self.task_service.create_task(
9226                                miner_info=payload,
9227                                executor_info=executor_info,
9228                                keypair=my_key,
9229                                private_key=private_key.decode("utf-8"),
9230                                public_key=public_key.decode("utf-8"),
9231                                encypted_files=encypted_files,
9232                                docker_hub_digests=docker_hub_digests,
9233                                debug=debug,
9234                            )
9235                        )
9236                        for executor_info in msg.executors
9237                    ]
9238
9239                    results = [
9240                        result
9241                        for result in await asyncio.gather(*tasks, return_exceptions=True)
9242                        if result
9243                    ]
9244
9245                    logger.info(
9246                        _m(
9247                            "Finished running tasks for executors",
9248                            extra=get_extra_info({**default_extra, "executors": len(results)}),
9249                        ),
9250                    )
9251
9252                    await miner_client.send_model(SSHPubKeyRemoveRequest(public_key=public_key))
9253
9254                    await self.publish_machine_specs(results, miner_client.miner_hotkey)
9255                    await self.store_executor_counts(
9256                        payload.miner_hotkey, payload.job_batch_id, len(msg.executors), results
9257                    )
9258
9259                    total_score = 0
9260                    for _, _, score, _, _, _, _ in results:
9261                        total_score += score
9262
9263                    logger.info(
9264                        _m(
9265                            f"total score: {total_score}",
9266                            extra=get_extra_info(default_extra),
9267                        )
9268                    )
9269
9270                    return {
9271                        "miner_hotkey": payload.miner_hotkey,
9272                        "score": total_score,
9273                    }
9274                elif isinstance(msg, FailedRequest):
9275                    logger.warning(
9276                        _m(
9277                            "Requesting job failed for miner",
9278                            extra=get_extra_info({**default_extra, "msg": str(msg)}),
9279                        ),
9280                    )
9281                    return None
9282                elif isinstance(msg, DeclineJobRequest):
9283                    logger.warning(
9284                        _m(
9285                            "Requesting job declined for miner",
9286                            extra=get_extra_info({**default_extra, "msg": str(msg)}),
9287                        ),
9288                    )
9289                    return None
9290                else:
9291                    logger.error(
9292                        _m(
9293                            "Unexpected msg",
9294                            extra=get_extra_info({**default_extra, "msg": str(msg)}),
9295                        ),
9296                    )
9297                    return None
9298        except asyncio.CancelledError:
9299            logger.error(
9300                _m("Requesting job to miner was cancelled", extra=get_extra_info(default_extra)),
9301            )
9302            return None
9303        except Exception as e:
9304            logger.error(
9305                _m(
9306                    "Requesting job to miner resulted in an exception",
9307                    extra=get_extra_info({**default_extra, "error": str(e)}),
9308                ),
9309                exc_info=True,
9310            )
9311            return None
9312
9313    async def publish_machine_specs(
9314        self, results: list[tuple[dict, ExecutorSSHInfo]], miner_hotkey: str
9315    ):
9316        """Publish machine specs to compute app connector process"""
9317        default_extra = {
9318            "miner_hotkey": miner_hotkey,
9319        }
9320
9321        logger.info(
9322            _m(
9323                "Publishing machine specs to compute app connector process",
9324                extra=get_extra_info({**default_extra, "results": len(results)}),
9325            ),
9326        )
9327        for (
9328            specs,
9329            ssh_info,
9330            score,
9331            synthetic_job_score,
9332            job_batch_id,
9333            log_status,
9334            log_text,
9335        ) in results:
9336            try:
9337                await self.redis_service.publish(
9338                    MACHINE_SPEC_CHANNEL_NAME,
9339                    {
9340                        "specs": specs,
9341                        "miner_hotkey": miner_hotkey,
9342                        "executor_uuid": ssh_info.uuid,
9343                        "executor_ip": ssh_info.address,
9344                        "executor_port": ssh_info.port,
9345                        "score": score,
9346                        "synthetic_job_score": synthetic_job_score,
9347                        "job_batch_id": job_batch_id,
9348                        "log_status": log_status,
9349                        "log_text": str(log_text),
9350                    },
9351                )
9352            except Exception as e:
9353                logger.error(
9354                    _m(
9355                        f"Error publishing machine specs of {miner_hotkey} to compute app connector process",
9356                        extra=get_extra_info({**default_extra, "error": str(e)}),
9357                    ),
9358                    exc_info=True,
9359                )
9360
9361    async def store_executor_counts(
9362        self, miner_hotkey: str, job_batch_id: str, total: int, results: list[dict]
9363    ):
9364        default_extra = {
9365            "job_batch_id": job_batch_id,
9366            "miner_hotkey": miner_hotkey,
9367        }
9368
9369        success = 0
9370        failed = 0
9371
9372        for _, _, score, _, _, _, _ in results:
9373            if score > 0:
9374                success += 1
9375            else:
9376                failed += 1
9377
9378        data = {"total": total, "success": success, "failed": failed}
9379
9380        key = f"{EXECUTOR_COUNT_PREFIX}:{miner_hotkey}"
9381
9382        try:
9383            await self.redis_service.hset(key, job_batch_id, json.dumps(data))
9384
9385            logger.info(
9386                _m(
9387                    "Stored executor counts",
9388                    extra=get_extra_info({**default_extra, **data}),
9389                ),
9390            )
9391        except Exception as e:
9392            logger.error(
9393                _m(
9394                    "Failed storing executor counts",
9395                    extra=get_extra_info({**default_extra, **data, "error": str(e)}),
9396                ),
9397                exc_info=True,
9398            )
9399
9400    async def handle_container(self, payload: ContainerBaseRequest):
9401        loop = asyncio.get_event_loop()
9402        my_key: bittensor.Keypair = settings.get_bittensor_wallet().get_hotkey()
9403        default_extra = {
9404            "miner_hotkey": payload.miner_hotkey,
9405            "executor_id": payload.executor_id,
9406            "executor_ip": payload.miner_address,
9407            "executor_port": payload.miner_port,
9408            "container_request_type": str(payload.message_type),
9409        }
9410
9411        docker_service = DockerService(
9412            ssh_service=self.ssh_service,
9413            redis_service=self.redis_service,
9414        )
9415
9416        try:
9417            miner_client = MinerClient(
9418                loop=loop,
9419                miner_address=payload.miner_address,
9420                miner_port=payload.miner_port,
9421                miner_hotkey=payload.miner_hotkey,
9422                my_hotkey=my_key.ss58_address,
9423                keypair=my_key,
9424                miner_url=f"ws://{payload.miner_address}:{payload.miner_port}/resources/{my_key.ss58_address}",
9425            )
9426
9427            async with miner_client:
9428                # generate ssh key and send it to miner
9429                private_key, public_key = self.ssh_service.generate_ssh_key(my_key.ss58_address)
9430                await miner_client.send_model(
9431                    SSHPubKeySubmitRequest(public_key=public_key, executor_id=payload.executor_id)
9432                )
9433
9434                logger.info(
9435                    _m("Sent SSH key to miner.", extra=get_extra_info(default_extra)),
9436                )
9437
9438                try:
9439                    msg = await asyncio.wait_for(
9440                        miner_client.job_state.miner_accepted_ssh_key_or_failed_future,
9441                        timeout=JOB_LENGTH,
9442                    )
9443                except TimeoutError:
9444                    logger.error(
9445                        _m(
9446                            "Waiting accepted ssh key or failed request from miner resulted in an timeout error",
9447                            extra=get_extra_info(default_extra),
9448                        ),
9449                    )
9450                    msg = None
9451                except Exception as e:
9452                    logger.error(
9453                        _m(
9454                            "Waiting accepted ssh key or failed request from miner resulted in an exception",
9455                            extra=get_extra_info({**default_extra, "error": str(e)}),
9456                        ),
9457                    )
9458                    msg = None
9459
9460                if isinstance(msg, AcceptSSHKeyRequest):
9461                    logger.info(
9462                        _m(
9463                            "Received AcceptSSHKeyRequest",
9464                            extra=get_extra_info({**default_extra, "msg": str(msg)}),
9465                        ),
9466                    )
9467
9468                    try:
9469                        executor = msg.executors[0]
9470                    except Exception as e:
9471                        logger.error(
9472                            _m(
9473                                "Error: Miner didn't return executor info",
9474                                extra=get_extra_info({**default_extra, "error": str(e)}),
9475                            ),
9476                        )
9477                        executor = None
9478
9479                    if executor is None or executor.uuid != payload.executor_id:
9480                        logger.error(
9481                            _m("Error: Invalid executor id", extra=get_extra_info(default_extra)),
9482                        )
9483                        await miner_client.send_model(
9484                            SSHPubKeyRemoveRequest(
9485                                public_key=public_key, executor_id=payload.executor_id
9486                            )
9487                        )
9488
9489                        await self.redis_service.remove_rented_machine(
9490                            RentedMachine(
9491                                miner_hotkey=payload.miner_hotkey,
9492                                executor_id=payload.executor_id,
9493                                executor_ip_address=executor.address if executor else "",
9494                                executor_ip_port=str(executor.port if executor else ""),
9495                            )
9496                        )
9497
9498                        return FailedContainerRequest(
9499                            miner_hotkey=payload.miner_hotkey,
9500                            executor_id=payload.executor_id,
9501                            msg=f"Invalid executor id {payload.executor_id}",
9502                            error_code=FailedContainerErrorCodes.InvalidExecutorId,
9503                        )
9504
9505                    try:
9506                        if isinstance(payload, ContainerCreateRequest):
9507                            logger.info(
9508                                _m(
9509                                    "Creating container",
9510                                    extra=get_extra_info(
9511                                        {**default_extra, "payload": str(payload)}
9512                                    ),
9513                                ),
9514                            )
9515                            result = await docker_service.create_container(
9516                                payload,
9517                                executor,
9518                                my_key,
9519                                private_key.decode("utf-8"),
9520                            )
9521
9522                            await miner_client.send_model(
9523                                SSHPubKeyRemoveRequest(
9524                                    public_key=public_key, executor_id=payload.executor_id
9525                                )
9526                            )
9527
9528                            if isinstance(result, FailedContainerRequest):
9529                                return result
9530
9531                            return ContainerCreated(
9532                                miner_hotkey=payload.miner_hotkey,
9533                                executor_id=payload.executor_id,
9534                                container_name=result.container_name,
9535                                volume_name=result.volume_name,
9536                                port_maps=result.port_maps,
9537                            )
9538
9539                        # elif isinstance(payload, ContainerStartRequest):
9540                        #     logger.info(
9541                        #         _m(
9542                        #             "Starting container",
9543                        #             extra=get_extra_info(
9544                        #                 {**default_extra, "payload": str(payload)}
9545                        #             ),
9546                        #         ),
9547                        #     )
9548                        #     await docker_service.start_container(
9549                        #         payload,
9550                        #         executor,
9551                        #         my_key,
9552                        #         private_key.decode("utf-8"),
9553                        #     )
9554
9555                        #     logger.info(
9556                        #         _m(
9557                        #             "Started Container",
9558                        #             extra=get_extra_info(
9559                        #                 {**default_extra, "payload": str(payload)}
9560                        #             ),
9561                        #         ),
9562                        #     )
9563                        #     await miner_client.send_model(
9564                        #         SSHPubKeyRemoveRequest(
9565                        #             public_key=public_key, executor_id=payload.executor_id
9566                        #         )
9567                        #     )
9568
9569                        #     return ContainerStarted(
9570                        #         miner_hotkey=payload.miner_hotkey,
9571                        #         executor_id=payload.executor_id,
9572                        #         container_name=payload.container_name,
9573                        #     )
9574                        # elif isinstance(payload, ContainerStopRequest):
9575                        #     await docker_service.stop_container(
9576                        #         payload,
9577                        #         executor,
9578                        #         my_key,
9579                        #         private_key.decode("utf-8"),
9580                        #     )
9581                        #     await miner_client.send_model(
9582                        #         SSHPubKeyRemoveRequest(
9583                        #             public_key=public_key, executor_id=payload.executor_id
9584                        #         )
9585                        #     )
9586
9587                        #     return ContainerStopped(
9588                        #         miner_hotkey=payload.miner_hotkey,
9589                        #         executor_id=payload.executor_id,
9590                        #         container_name=payload.container_name,
9591                        #     )
9592                        elif isinstance(payload, ContainerDeleteRequest):
9593                            logger.info(
9594                                _m(
9595                                    "Deleting container",
9596                                    extra=get_extra_info(
9597                                        {**default_extra, "payload": str(payload)}
9598                                    ),
9599                                ),
9600                            )
9601                            await docker_service.delete_container(
9602                                payload,
9603                                executor,
9604                                my_key,
9605                                private_key.decode("utf-8"),
9606                            )
9607
9608                            logger.info(
9609                                _m(
9610                                    "Deleted Container",
9611                                    extra=get_extra_info(
9612                                        {**default_extra, "payload": str(payload)}
9613                                    ),
9614                                ),
9615                            )
9616                            await miner_client.send_model(
9617                                SSHPubKeyRemoveRequest(
9618                                    public_key=public_key, executor_id=payload.executor_id
9619                                )
9620                            )
9621
9622                            return ContainerDeleted(
9623                                miner_hotkey=payload.miner_hotkey,
9624                                executor_id=payload.executor_id,
9625                                container_name=payload.container_name,
9626                                volume_name=payload.volume_name,
9627                            )
9628                        else:
9629                            logger.error(
9630                                _m(
9631                                    "Unexpected request",
9632                                    extra=get_extra_info(
9633                                        {**default_extra, "payload": str(payload)}
9634                                    ),
9635                                ),
9636                            )
9637                            return FailedContainerRequest(
9638                                miner_hotkey=payload.miner_hotkey,
9639                                executor_id=payload.executor_id,
9640                                msg=f"Unexpected request: {payload}",
9641                                error_code=FailedContainerErrorCodes.UnknownError,
9642                            )
9643
9644                    except Exception as e:
9645                        logger.error(
9646                            _m(
9647                                "Error: create container error",
9648                                extra=get_extra_info({**default_extra, "error": str(e)}),
9649                            ),
9650                        )
9651                        await miner_client.send_model(
9652                            SSHPubKeyRemoveRequest(
9653                                public_key=public_key, executor_id=payload.executor_id
9654                            )
9655                        )
9656
9657                        return FailedContainerRequest(
9658                            miner_hotkey=payload.miner_hotkey,
9659                            executor_id=payload.executor_id,
9660                            msg=f"create container error: {str(e)}",
9661                            error_code=FailedContainerErrorCodes.ExceptionError,
9662                        )
9663
9664                elif isinstance(msg, FailedRequest):
9665                    logger.info(
9666                        _m(
9667                            "Error: Miner failed job",
9668                            extra=get_extra_info({**default_extra, "msg": str(msg)}),
9669                        ),
9670                    )
9671                    return FailedContainerRequest(
9672                        miner_hotkey=payload.miner_hotkey,
9673                        executor_id=payload.executor_id,
9674                        msg=f"Failed request from miner: {str(msg)}",
9675                        error_code=FailedContainerErrorCodes.FailedMsgFromMiner,
9676                    )
9677                else:
9678                    logger.error(
9679                        _m(
9680                            "Error: Unexpected msg",
9681                            extra=get_extra_info({**default_extra, "msg": str(msg)}),
9682                        ),
9683                    )
9684                    return FailedContainerRequest(
9685                        miner_hotkey=payload.miner_hotkey,
9686                        executor_id=payload.executor_id,
9687                        msg=f"Unexpected msg: {str(msg)}",
9688                        error_code=FailedContainerErrorCodes.UnknownError,
9689                    )
9690        except Exception as e:
9691            log_text = _m(
9692                "[handle_container] resulted in an exception",
9693                extra=get_extra_info({**default_extra, "error": str(e)}),
9694            )
9695
9696            logger.error(log_text, exc_info=True)
9697
9698            return FailedContainerRequest(
9699                miner_hotkey=payload.miner_hotkey,
9700                executor_id=payload.executor_id,
9701                msg=f"Exception: {str(e)}",
9702                error_code=FailedContainerErrorCodes.ExceptionError,
9703            )
9704
9705
9706MinerServiceDep = Annotated[MinerService, Depends(MinerService)]
9707
9708
9709
9710---
9711File: /neurons/validators/src/services/redis_service.py
9712---
9713
9714import json
9715import asyncio
9716import redis.asyncio as aioredis
9717from protocol.vc_protocol.compute_requests import RentedMachine
9718from core.config import settings
9719
9720MACHINE_SPEC_CHANNEL_NAME = "channel:1"
9721STREAMING_LOG_CHANNEL = "channel:2"
9722RENTED_MACHINE_SET = "rented_machines"
9723DUPLICATED_MACHINE_SET = "duplicated_machines"
9724EXECUTOR_COUNT_PREFIX = "executor_counts"
9725AVAILABLE_PORT_MAPS_PREFIX = "available_port_maps"
9726VERIFIED_JOB_COUNT_KEY = "verified_job_counts"
9727
9728
9729class RedisService:
9730    def __init__(self):
9731        self.redis = aioredis.from_url(f"redis://{settings.REDIS_HOST}:{settings.REDIS_PORT}")
9732        self.lock = asyncio.Lock()
9733
9734    async def publish(self, channel: str, message: dict):
9735        """Publish a message to a Redis channel."""
9736        await self.redis.publish(channel, json.dumps(message))
9737
9738    async def subscribe(self, channel: str):
9739        """Subscribe to a Redis channel."""
9740        pubsub = self.redis.pubsub()
9741        await pubsub.subscribe(channel)
9742        return pubsub
9743
9744    async def set(self, key: str, value: str):
9745        """Set a key-value pair in Redis."""
9746        async with self.lock:
9747            await self.redis.set(key, value)
9748
9749    async def get(self, key: str):
9750        """Get a value by key from Redis."""
9751        async with self.lock:
9752            return await self.redis.get(key)
9753
9754    async def delete(self, key: str):
9755        """Remove a key from Redis."""
9756        async with self.lock:
9757            await self.redis.delete(key)
9758
9759    async def sadd(self, key: str, elem: str):
9760        """Add an element to a set in Redis."""
9761        async with self.lock:
9762            await self.redis.sadd(key, elem)
9763
9764    async def srem(self, key: str, elem: str):
9765        """Remove an element from a set in Redis."""
9766        async with self.lock:
9767            await self.redis.srem(key, elem)
9768
9769    async def is_elem_exists_in_set(self, key: str, elem: str) -> bool:
9770        """Check an element exists or not in a set in Redis."""
9771        async with self.lock:
9772            return await self.redis.sismember(key, elem)
9773
9774    async def smembers(self, key: str):
9775        async with self.lock:
9776            return await self.redis.smembers(key)
9777
9778    async def add_rented_machine(self, machine: RentedMachine):
9779        await self.sadd(RENTED_MACHINE_SET, f"{machine.miner_hotkey}:{machine.executor_id}")
9780
9781    async def remove_rented_machine(self, machine: RentedMachine):
9782        await self.srem(RENTED_MACHINE_SET, f"{machine.miner_hotkey}:{machine.executor_id}")
9783
9784    async def lpush(self, key: str, element: bytes):
9785        """Add an element to a list in Redis."""
9786        async with self.lock:
9787            await self.redis.lpush(key, element)
9788
9789    async def lrange(self, key: str) -> list[bytes]:
9790        """Get all elements from a list in Redis in order."""
9791        async with self.lock:
9792            return await self.redis.lrange(key, 0, -1)
9793
9794    async def lrem(self, key: str, element: bytes, count: int = 0):
9795        """Remove elements from a list in Redis."""
9796        async with self.lock:
9797            await self.redis.lrem(key, count, element)
9798
9799    async def ltrim(self, key: str, max_length: int):
9800        """Trim the list to maintain a maximum length."""
9801        async with self.lock:
9802            await self.redis.ltrim(key, 0, max_length - 1)
9803
9804    async def lpop(self, key: str) -> bytes:
9805        """Remove and return the first element (last inserted) from a list in Redis."""
9806        async with self.lock:
9807            return await self.redis.lpop(key)
9808
9809    async def rpop(self, key: str) -> bytes:
9810        """Remove and return the last element (first inserted) from a list in Redis."""
9811        async with self.lock:
9812            return await self.redis.rpop(key)
9813
9814    async def hset(self, key: str, field: str, value: str):
9815        async with self.lock:
9816            await self.redis.hset(key, field, value)
9817
9818    async def hget(self, key: str, field: str):
9819        async with self.lock:
9820            return await self.redis.hget(key, field)
9821
9822    async def hgetall(self, key: str):
9823        async with self.lock:
9824            return await self.redis.hgetall(key)
9825
9826    async def hdel(self, key: str, *fields: str):
9827        async with self.lock:
9828            await self.redis.hdel(key, *fields)
9829
9830    async def clear_by_pattern(self, pattern: str):
9831        async with self.lock:
9832            async for key in self.redis.scan_iter(match=pattern):
9833                await self.redis.delete(key.decode())
9834
9835    async def clear_all_executor_counts(self):
9836        pattern = f"{EXECUTOR_COUNT_PREFIX}:*"
9837        cursor = 0
9838
9839        async with self.lock:
9840            while True:
9841                cursor, keys = await self.redis.scan(cursor, match=pattern, count=100)
9842                if keys:
9843                    await self.redis.delete(*keys)
9844                if cursor == 0:
9845                    break
9846
9847    async def clear_all_ssh_ports(self):
9848        pattern = f"{AVAILABLE_PORT_MAPS_PREFIX}:*"
9849        await self.clear_by_pattern(pattern)
9850
9851    async def set_verified_job_count(self, executor_id: str, count: int):
9852        data = {
9853            "count": count,
9854        }
9855
9856        await self.hset(VERIFIED_JOB_COUNT_KEY, executor_id, json.dumps(data))
9857
9858    async def get_verified_job_count(self, executor_id: str):
9859        data = await self.hget(VERIFIED_JOB_COUNT_KEY, executor_id)
9860
9861        if not data:
9862            return 0
9863
9864        return json.loads(data)['count']
9865
9866
9867
9868---
9869File: /neurons/validators/src/services/ssh_service.py
9870---
9871
9872import hashlib
9873from base64 import b64encode
9874import random
9875import string
9876
9877from cryptography.fernet import Fernet
9878from cryptography.hazmat.primitives import serialization
9879from cryptography.hazmat.primitives.asymmetric import ed25519
9880
9881
9882class SSHService:
9883    def generate_random_string(self, length=30):
9884        characters = (
9885            string.ascii_letters + string.digits +
9886            "/ "
9887        )
9888        random_string = ''.join(random.choices(characters, k=length))
9889        return random_string
9890
9891    def _hash(self, s: bytes) -> bytes:
9892        return b64encode(hashlib.sha256(s).digest(), altchars=b"-_")
9893
9894    def _encrypt(self, key: str, payload: str) -> str:
9895        key_bytes = self._hash(key.encode("utf-8"))
9896        return Fernet(key_bytes).encrypt(payload.encode("utf-8")).decode("utf-8")
9897
9898    def decrypt_payload(self, key: str, encrypted_payload: str) -> str:
9899        key_bytes = self._hash(key.encode("utf-8"))
9900        return Fernet(key_bytes).decrypt(encrypted_payload.encode("utf-8")).decode("utf-8")
9901
9902    def generate_ssh_key(self, encryption_key: str) -> (bytes, bytes):
9903        """Generate SSH key pair.
9904
9905        Args:
9906            encryption_key (str): key to encrypt the private key.
9907
9908        Returns:
9909            (bytes, bytes): return (private key bytes, public key bytes)
9910        """
9911        # Generate a new private-public key pair
9912        private_key = ed25519.Ed25519PrivateKey.generate()
9913        public_key = private_key.public_key()
9914
9915        private_key_bytes = private_key.private_bytes(
9916            encoding=serialization.Encoding.PEM,
9917            format=serialization.PrivateFormat.OpenSSH,
9918            # encryption_algorithm=BestAvailableEncryption(encryption_key.encode()),
9919            encryption_algorithm=serialization.NoEncryption(),
9920        )
9921        public_key_bytes = public_key.public_bytes(
9922            encoding=serialization.Encoding.OpenSSH,
9923            format=serialization.PublicFormat.OpenSSH,
9924        )
9925
9926        # extract pub key content, excluding first line and end line
9927        # pub_key_str = "".join(public_key_bytes.decode().split("\n")[1:-2])
9928
9929        return self._encrypt(encryption_key, private_key_bytes.decode("utf-8")).encode(
9930            "utf-8"
9931        ), public_key_bytes
9932
9933
9934
9935---
9936File: /neurons/validators/src/services/task_service.py
9937---
9938
9939import asyncio
9940import json
9941import logging
9942import os
9943import random
9944import time
9945import uuid
9946from typing import Annotated
9947
9948import asyncssh
9949import bittensor
9950from datura.requests.miner_requests import ExecutorSSHInfo
9951from fastapi import Depends
9952from payload_models.payloads import MinerJobEnryptedFiles, MinerJobRequestPayload
9953
9954from core.config import settings
9955from core.utils import _m, context, get_extra_info
9956from services.const import (
9957    DOWNLOAD_SPEED_WEIGHT,
9958    GPU_MAX_SCORES,
9959    JOB_TAKEN_TIME_WEIGHT,
9960    MAX_DOWNLOAD_SPEED,
9961    MAX_UPLOAD_SPEED,
9962    UPLOAD_SPEED_WEIGHT,
9963    MAX_GPU_COUNT,
9964    UNRENTED_MULTIPLIER,
9965    HASHCAT_CONFIGS,
9966    LIB_NVIDIA_ML_DIGESTS,
9967    DOCKER_DIGESTS,
9968    GPU_UTILIZATION_LIMIT,
9969    GPU_MEMORY_UTILIZATION_LIMIT,
9970    VERIFY_JOB_REQUIRED_COUNT,
9971)
9972from services.redis_service import (
9973    RedisService,
9974    RENTED_MACHINE_SET,
9975    DUPLICATED_MACHINE_SET,
9976    AVAILABLE_PORT_MAPS_PREFIX,
9977)
9978from services.ssh_service import SSHService
9979from services.hash_service import HashService
9980
9981logger = logging.getLogger(__name__)
9982
9983JOB_LENGTH = 300
9984
9985
9986class TaskService:
9987    def __init__(
9988        self,
9989        ssh_service: Annotated[SSHService, Depends(SSHService)],
9990        redis_service: Annotated[RedisService, Depends(RedisService)],
9991    ):
9992        self.ssh_service = ssh_service
9993        self.redis_service = redis_service
9994        self.wallet = settings.get_bittensor_wallet()
9995
9996    async def upload_directory(
9997        self, ssh_client: asyncssh.SSHClientConnection, local_dir: str, remote_dir: str
9998    ):
9999        """Uploads a directory recursively to a remote server using AsyncSSH."""
10000        async with ssh_client.start_sftp_client() as sftp_client:
10001            for root, dirs, files in os.walk(local_dir):
10002                relative_dir = os.path.relpath(root, local_dir)
10003                remote_path = os.path.join(remote_dir, relative_dir)
10004
10005                # Create remote directory if it doesn't exist
10006                result = await ssh_client.run(f"mkdir -p {remote_path}")
10007                if result.exit_status != 0:
10008                    raise Exception(f"Failed to create directory {remote_path}: {result.stderr}")
10009
10010                # Upload files
10011                upload_tasks = []
10012                for file in files:
10013                    local_file = os.path.join(root, file)
10014                    remote_file = os.path.join(remote_path, file)
10015                    upload_tasks.append(sftp_client.put(local_file, remote_file))
10016
10017                # Await all upload tasks for the current directory
10018                await asyncio.gather(*upload_tasks)
10019
10020    async def is_script_running(
10021        self, ssh_client: asyncssh.SSHClientConnection, script_path: str
10022    ) -> bool:
10023        """
10024        Check if a specific script is running.
10025
10026        Args:
10027            ssh_client: SSH client instance
10028            script_path: Full path to the script (e.g., '/root/app/gpus_utility.py')
10029
10030
10031        Returns:
10032            bool: True if script is running, False otherwise
10033        """
10034        try:
10035            result = await ssh_client.run(f'ps aux | grep "python.*{script_path}"', timeout=10)
10036            # Filter out the grep process itself
10037            processes = [line for line in result.stdout.splitlines() if "grep" not in line]
10038
10039            logger.info(f"{script_path} running status: {bool(processes)}")
10040            return bool(processes)
10041        except Exception as e:
10042            logger.error(f"Error checking {script_path} status: {e}")
10043            return False
10044
10045    async def start_script(
10046        self,
10047        ssh_client: asyncssh.SSHClientConnection,
10048        script_path: str,
10049        command_args: dict,
10050        executor_info: ExecutorSSHInfo,
10051    ) -> bool:
10052        """
10053        Start a script with specified arguments.
10054
10055        Args:
10056            ssh_client: SSH client instance
10057            script_path: Full path to the script (e.g., '/root/app/gpus_utility.py')
10058            command_args: Dictionary of argument names and values
10059
10060        Returns:
10061            bool: True if script started successfully, False otherwise
10062        """
10063        try:
10064            # Build command string from arguments
10065            args_string = " ".join([f"--{key} {value}" for key, value in command_args.items()])
10066            await ssh_client.run("pip install aiohttp click pynvml psutil", timeout=30)
10067            command = (
10068                f"nohup {executor_info.python_path} {script_path} {args_string} > /dev/null 2>&1 & "
10069            )
10070            # Run the script
10071            result = await ssh_client.run(command, timeout=50, check=True)
10072            logger.info(f"Started {script_path}: {result}")
10073            return True
10074        except Exception as e:
10075            logger.error(f"Error starting script {script_path}: {e}", exc_info=True)
10076            return False
10077
10078    def validate_digests(self, docker_digests, docker_hub_digests):
10079        # Check if the list is empty
10080        if not docker_digests:
10081            return False
10082
10083        # Get unique digests
10084        unique_digests = list({item["digest"] for item in docker_digests})
10085
10086        # Check for duplicates
10087        if len(unique_digests) != len(docker_digests):
10088            return False
10089
10090        # Check if any digest is invalid
10091        for digest in unique_digests:
10092            if digest not in docker_hub_digests.values():
10093                return False
10094
10095        return True
10096
10097    async def clear_remote_directory(
10098        self, ssh_client: asyncssh.SSHClientConnection, remote_dir: str
10099    ):
10100        try:
10101            await ssh_client.run(f"rm -rf {remote_dir}", timeout=10)
10102        except Exception as e:
10103            logger.error(f"Error clearing remote directory: {e}")
10104
10105    def get_available_port_map(
10106        self,
10107        executor_info: ExecutorSSHInfo,
10108    ) -> tuple[int, int] | None:
10109        if executor_info.port_mappings:
10110            port_mappings: list[tuple[int, int]] = json.loads(executor_info.port_mappings)
10111            port_mappings = [
10112                (internal_port, external_port)
10113                for internal_port, external_port in port_mappings
10114                if internal_port != executor_info.ssh_port
10115                and external_port != executor_info.ssh_port
10116            ]
10117
10118            if not port_mappings:
10119                return None
10120
10121            return random.choice(port_mappings)
10122
10123        if executor_info.port_range:
10124            if "-" in executor_info.port_range:
10125                min_port, max_port = map(
10126                    int, (part.strip() for part in executor_info.port_range.split("-"))
10127                )
10128                ports = list(range(min_port, max_port + 1))
10129            else:
10130                ports = list(
10131                    map(int, (part.strip() for part in executor_info.port_range.split(",")))
10132                )
10133        else:
10134            # Default range if port_range is empty
10135            ports = list(range(40000, 65536))
10136
10137        ports = [port for port in ports if port != executor_info.ssh_port]
10138
10139        if not ports:
10140            return None
10141
10142        internal_port = random.choice(ports)
10143
10144        return internal_port, internal_port
10145
10146    async def docker_connection_check(
10147        self,
10148        ssh_client: asyncssh.SSHClientConnection,
10149        job_batch_id: str,
10150        miner_hotkey: str,
10151        executor_info: ExecutorSSHInfo,
10152        private_key: str,
10153        public_key: str,
10154    ):
10155        port_map = self.get_available_port_map(executor_info)
10156        if port_map is None:
10157            log_text = _m(
10158                "No port available for docker container",
10159                extra=get_extra_info(
10160                    {
10161                        "job_batch_id": job_batch_id,
10162                        "miner_hotkey": miner_hotkey,
10163                        "executor_uuid": executor_info.uuid,
10164                        "executor_ip_address": executor_info.address,
10165                        "executor_port": executor_info.port,
10166                        "ssh_username": executor_info.ssh_username,
10167                        "ssh_port": executor_info.ssh_port,
10168                        "version": settings.VERSION
10169                    }
10170                ),
10171            )
10172            log_status = "error"
10173            logger.error(log_text, exc_info=True)
10174
10175            return False, log_text, log_status
10176
10177        internal_port, external_port = port_map
10178        executor_name = f"{executor_info.uuid}_{executor_info.address}_{executor_info.port}"
10179        default_extra = {
10180            "job_batch_id": job_batch_id,
10181            "miner_hotkey": miner_hotkey,
10182            "executor_uuid": executor_info.uuid,
10183            "executor_ip_address": executor_info.address,
10184            "executor_port": executor_info.port,
10185            "ssh_username": executor_info.ssh_username,
10186            "ssh_port": executor_info.ssh_port,
10187            "internal_port": internal_port,
10188            "external_port": external_port,
10189            "version": settings.VERSION,
10190        }
10191        context.set(f"[_docker_connection_check][{executor_name}]")
10192
10193        container_name = f"container_{miner_hotkey}"
10194
10195        try:
10196            result = await ssh_client.run(f"docker ps -q -f name={container_name}")
10197            if result.stdout.strip():
10198                command = f"docker rm {container_name} -f"
10199                await ssh_client.run(command)
10200
10201            log_text = _m(
10202                "Creating docker container",
10203                extra=default_extra,
10204            )
10205            log_status = "info"
10206            logger.info(log_text)
10207
10208            docker_cmd = f"sh -c 'mkdir -p ~/.ssh && echo \"{public_key}\" >> ~/.ssh/authorized_keys && ssh-keygen -A && service ssh start && tail -f /dev/null'"
10209            command = f"docker run -d --name {container_name} -p {internal_port}:22 daturaai/compute-subnet-executor:latest {docker_cmd}"
10210
10211            result = await ssh_client.run(command)
10212            if result.exit_status != 0:
10213                error_message = result.stderr.strip() if result.stderr else "No error message available"
10214                log_text = _m(
10215                    "Error creating docker connection",
10216                    extra=get_extra_info({
10217                        **default_extra,
10218                        "error": error_message
10219                    }),
10220                )
10221                log_status = "error"
10222                logger.error(log_text, exc_info=True)
10223
10224                try:
10225                    command = f"docker rm {container_name} -f"
10226                    await ssh_client.run(command)
10227                except Exception as e:
10228                    logger.error(f"Error removing docker container: {e}")
10229
10230                return False, log_text, log_status
10231
10232            await asyncio.sleep(3)
10233
10234            pkey = asyncssh.import_private_key(private_key)
10235            async with asyncssh.connect(
10236                host=executor_info.address,
10237                port=external_port,
10238                username=executor_info.ssh_username,
10239                client_keys=[pkey],
10240                known_hosts=None,
10241            ) as _:
10242                log_text = _m(
10243                    "Connected into docker container",
10244                    extra=default_extra,
10245                )
10246                logger.info(log_text)
10247
10248                # set port on redis
10249                key = f"{AVAILABLE_PORT_MAPS_PREFIX}:{miner_hotkey}:{executor_info.uuid}"
10250                port_map = f"{internal_port},{external_port}"
10251
10252                # delete all the same port_maps in the list
10253                await self.redis_service.lrem(key=key, element=port_map)
10254
10255                # insert port_map in the list
10256                await self.redis_service.lpush(key, port_map)
10257
10258                # keep the latest 10 port maps
10259                port_maps = await self.redis_service.lrange(key)
10260                if len(port_maps) > 10:
10261                    await self.redis_service.rpop(key)
10262
10263            command = f"docker rm {container_name} -f"
10264            await ssh_client.run(command)
10265
10266            return True, log_text, log_status
10267        except Exception as e:
10268            log_text = _m(
10269                "Error connection docker container",
10270                extra=get_extra_info({**default_extra, "error": str(e)}),
10271            )
10272            log_status = "error"
10273            logger.error(log_text, exc_info=True)
10274
10275            try:
10276                command = f"docker rm {container_name} -f"
10277                await ssh_client.run(command)
10278            except Exception as e:
10279                logger.error(f"Error removing docker container: {e}")
10280
10281            return False, log_text, log_status
10282
10283    async def clear_verified_job_count(self, executor_info: ExecutorSSHInfo):
10284        await self.redis_service.set_verified_job_count(executor_info.uuid, 0)
10285
10286    async def create_task(
10287        self,
10288        miner_info: MinerJobRequestPayload,
10289        executor_info: ExecutorSSHInfo,
10290        keypair: bittensor.Keypair,
10291        private_key: str,
10292        public_key: str,
10293        encypted_files: MinerJobEnryptedFiles,
10294        docker_hub_digests: dict[str, str],
10295        debug: bool = False,
10296    ):
10297        default_extra = {
10298            "job_batch_id": miner_info.job_batch_id,
10299            "miner_hotkey": miner_info.miner_hotkey,
10300            "executor_uuid": executor_info.uuid,
10301            "executor_ip_address": executor_info.address,
10302            "executor_port": executor_info.port,
10303            "executor_ssh_username": executor_info.ssh_username,
10304            "executor_ssh_port": executor_info.ssh_port,
10305            "version": settings.VERSION,
10306        }
10307        try:
10308            logger.info(_m("Start job on an executor", extra=get_extra_info(default_extra)))
10309
10310            private_key = self.ssh_service.decrypt_payload(keypair.ss58_address, private_key)
10311            pkey = asyncssh.import_private_key(private_key)
10312
10313            async with asyncssh.connect(
10314                host=executor_info.address,
10315                port=executor_info.ssh_port,
10316                username=executor_info.ssh_username,
10317                client_keys=[pkey],
10318                known_hosts=None,
10319            ) as ssh_client:
10320                remote_dir = f"{executor_info.root_dir}/temp"
10321                await ssh_client.run(f"rm -rf {remote_dir}")
10322                await ssh_client.run(f"mkdir -p {remote_dir}")
10323
10324                # start gpus_utility.py
10325                program_id = str(uuid.uuid4())
10326                command_args = {
10327                    "program_id": program_id,
10328                    "signature": f"0x{keypair.sign(program_id.encode()).hex()}",
10329                    "executor_id": executor_info.uuid,
10330                    "validator_hotkey": keypair.ss58_address,
10331                    "compute_rest_app_url": settings.COMPUTE_REST_API_URL,
10332                }
10333                script_path = f"{executor_info.root_dir}/src/gpus_utility.py"
10334                if not await self.is_script_running(ssh_client, script_path):
10335                    await self.start_script(ssh_client, script_path, command_args, executor_info)
10336
10337                if debug is True:
10338                    logger.info("Debug mode is enabled. Skipping other tasks.")
10339                    return (
10340                        None,
10341                        executor_info,
10342                        0,
10343                        0,
10344                        miner_info.job_batch_id,
10345                        "info",
10346                        "Debug mode is enabled. Skipping other tasks.",
10347                    )
10348
10349                # upload temp directory
10350                await self.upload_directory(ssh_client, encypted_files.tmp_directory, remote_dir)
10351
10352                remote_machine_scrape_file_path = (
10353                    f"{remote_dir}/{encypted_files.machine_scrape_file_name}"
10354                )
10355                remote_score_file_path = f"{remote_dir}/{encypted_files.score_file_name}"
10356
10357                logger.info(
10358                    _m(
10359                        "Uploaded files to run job",
10360                        extra=get_extra_info(default_extra),
10361                    ),
10362                )
10363
10364                machine_specs, _ = await self._run_task(
10365                    ssh_client=ssh_client,
10366                    miner_hotkey=miner_info.miner_hotkey,
10367                    executor_info=executor_info,
10368                    command=f"chmod +x {remote_machine_scrape_file_path} && {remote_machine_scrape_file_path}",
10369                )
10370                if not machine_specs:
10371                    log_status = "warning"
10372                    log_text = _m("No machine specs found", extra=get_extra_info(default_extra))
10373                    logger.warning(log_text)
10374
10375                    await self.clear_remote_directory(ssh_client, remote_dir)
10376                    await self.clear_verified_job_count(executor_info)
10377
10378                    return (
10379                        None,
10380                        executor_info,
10381                        0,
10382                        0,
10383                        miner_info.job_batch_id,
10384                        log_status,
10385                        log_text,
10386                    )
10387
10388                machine_spec = json.loads(
10389                    self.ssh_service.decrypt_payload(
10390                        encypted_files.encrypt_key, machine_specs[0].strip()
10391                    )
10392                )
10393
10394                gpu_model = None
10395                if machine_spec.get("gpu", {}).get("count", 0) > 0:
10396                    details = machine_spec["gpu"].get("details", [])
10397                    if len(details) > 0:
10398                        gpu_model = details[0].get("name", None)
10399
10400                max_score = 0
10401                if gpu_model:
10402                    max_score = GPU_MAX_SCORES.get(gpu_model, 0)
10403
10404                gpu_count = machine_spec.get("gpu", {}).get("count", 0)
10405                gpu_details = machine_spec.get("gpu", {}).get("details", [])
10406
10407                nvidia_driver = machine_spec.get("gpu", {}).get("driver", "")
10408                libnvidia_ml = machine_spec.get("md5_checksums", {}).get("libnvidia_ml", "")
10409
10410                docker_version = machine_spec.get("docker", {}).get("version", "")
10411                docker_digest = machine_spec.get("md5_checksums", {}).get("docker", "")
10412
10413                ram = machine_spec.get("ram", {}).get("total", 0)
10414                storage = machine_spec.get("hard_disk", {}).get("free", 0)
10415
10416                gpu_processes = machine_spec.get("gpu_processes", [])
10417
10418                vram = 0
10419                for detail in gpu_details:
10420                    vram += detail.get("capacity", 0) * 1024
10421
10422                logger.info(
10423                    _m(
10424                        "Machine spec scraped",
10425                        extra=get_extra_info(
10426                            {
10427                                **default_extra,
10428                                "gpu_model": gpu_model,
10429                                "gpu_count": gpu_count,
10430                                "nvidia_driver": nvidia_driver,
10431                                "libnvidia_ml": libnvidia_ml,
10432                            }
10433                        ),
10434                    ),
10435                )
10436
10437                if gpu_count > MAX_GPU_COUNT:
10438                    log_status = "warning"
10439                    log_text = _m(
10440                        f"GPU count({gpu_count}) is greater than the maximum allowed ({MAX_GPU_COUNT}).",
10441                        extra=get_extra_info(default_extra),
10442                    )
10443                    logger.warning(log_text)
10444
10445                    await self.clear_remote_directory(ssh_client, remote_dir)
10446                    await self.clear_verified_job_count(executor_info)
10447
10448                    return (
10449                        machine_spec,
10450                        executor_info,
10451                        0,
10452                        0,
10453                        miner_info.job_batch_id,
10454                        log_status,
10455                        log_text,
10456                    )
10457
10458                if max_score == 0 or gpu_count == 0 or len(gpu_details) != gpu_count:
10459                    extra_info = {
10460                        **default_extra,
10461                        "os_version": machine_spec.get("os", ""),
10462                        "nvidia_cfg": machine_spec.get("nvidia_cfg", ""),
10463                        "docker_cfg": machine_spec.get("docker_cfg", ""),
10464                        "gpu_scrape_error": machine_spec.get("gpu_scrape_error", ""),
10465                        "nvidia_cfg_scrape_error": machine_spec.get("nvidia_cfg_scrape_error", ""),
10466                        "docker_cfg_scrape_error": machine_spec.get("docker_cfg_scrape_error", ""),
10467                    }
10468                    if gpu_model:
10469                        extra_info["gpu_model"] = gpu_model
10470                        extra_info["help_text"] = (
10471                            "If you have the gpu machine and encountering this issue consistantly, "
10472                            "then please pull the latest version of github repository and follow the installation guide here: "
10473                            "https://github.com/Datura-ai/compute-subnet/tree/main/neurons/executor. "
10474                            "Also, please configure the nvidia-container-runtime correctly. Check out here: "
10475                            "https://stackoverflow.com/questions/72932940/failed-to-initialize-nvml-unknown-error-in-docker-after-few-hours "
10476                            "https://bobcares.com/blog/docker-failed-to-initialize-nvml-unknown-error/"
10477                        )
10478
10479                    log_text = _m(
10480                        f"Max Score({max_score}) or GPU count({gpu_count}) is 0. No need to run job.",
10481                        extra=get_extra_info(
10482                            {
10483                                **default_extra,
10484                                **extra_info,
10485                            }
10486                        ),
10487                    )
10488                    log_status = "warning"
10489                    logger.warning(log_text)
10490
10491                    await self.clear_remote_directory(ssh_client, remote_dir)
10492                    await self.clear_verified_job_count(executor_info)
10493
10494                    return (
10495                        machine_spec,
10496                        executor_info,
10497                        0,
10498                        0,
10499                        miner_info.job_batch_id,
10500                        log_status,
10501                        log_text,
10502                    )
10503
10504                if not docker_version or DOCKER_DIGESTS.get(docker_version) != docker_digest:
10505                    log_status = "warning"
10506                    log_text = _m(
10507                        "Docker is altered",
10508                        extra=get_extra_info(
10509                            {
10510                                **default_extra,
10511                                "docker_version": docker_version,
10512                                "docker_digest": docker_digest,
10513                            }
10514                        ),
10515                    )
10516                    logger.warning(log_text)
10517
10518                    await self.clear_remote_directory(ssh_client, remote_dir)
10519                    await self.clear_verified_job_count(executor_info)
10520
10521                    return (
10522                        machine_spec,
10523                        executor_info,
10524                        0,
10525                        0,
10526                        miner_info.job_batch_id,
10527                        log_status,
10528                        log_text,
10529                    )
10530
10531                if nvidia_driver and LIB_NVIDIA_ML_DIGESTS.get(nvidia_driver) != libnvidia_ml:
10532                    log_status = "warning"
10533                    log_text = _m(
10534                        "Nvidia driver is altered",
10535                        extra=get_extra_info(
10536                            {
10537                                **default_extra,
10538                                "gpu_model": gpu_model,
10539                                "gpu_count": gpu_count,
10540                                "nvidia_driver": nvidia_driver,
10541                                "libnvidia_ml": libnvidia_ml,
10542                            }
10543                        ),
10544                    )
10545                    logger.warning(log_text)
10546
10547                    await self.clear_remote_directory(ssh_client, remote_dir)
10548                    await self.clear_verified_job_count(executor_info)
10549
10550                    return (
10551                        machine_spec,
10552                        executor_info,
10553                        0,
10554                        0,
10555                        miner_info.job_batch_id,
10556                        log_status,
10557                        log_text,
10558                    )
10559
10560                for process in gpu_processes:
10561                    container_name = process.get('container_name', None)
10562                    if not container_name:
10563                        log_status = "warning"
10564                        log_text = _m(
10565                            "GPU is using in some other places",
10566                            extra=get_extra_info(
10567                                {
10568                                    **default_extra,
10569                                    "gpu_model": gpu_model,
10570                                    "gpu_count": gpu_count,
10571                                    **process,
10572                                }
10573                            ),
10574                        )
10575                        logger.warning(log_text)
10576
10577                        await self.clear_remote_directory(ssh_client, remote_dir)
10578                        await self.clear_verified_job_count(executor_info)
10579
10580                        return (
10581                            machine_spec,
10582                            executor_info,
10583                            0,
10584                            0,
10585                            miner_info.job_batch_id,
10586                            log_status,
10587                            log_text,
10588                        )
10589                    
10590                # if ram < vram * 0.9 or storage < vram * 1.5:
10591                #     log_status = "warning"
10592                #     log_text = _m(
10593                #         "Incorrect vram",
10594                #         extra=get_extra_info(
10595                #             {
10596                #                 **default_extra,
10597                #                 "gpu_model": gpu_model,
10598                #                 "gpu_count": gpu_count,
10599                #                 "memory": ram,
10600                #                 "vram": vram,
10601                #                 "storage": storage,
10602                #                 "nvidia_driver": nvidia_driver,
10603                #                 "libnvidia_ml": libnvidia_ml,
10604                #             }
10605                #         ),
10606                #     )
10607                #     logger.warning(log_text)
10608
10609                #     await self.clear_remote_directory(ssh_client, remote_dir)
10610                #     await self.clear_verified_job_count(executor_info)
10611
10612                #     return (
10613                #         machine_spec,
10614                #         executor_info,
10615                #         0,
10616                #         0,
10617                #         miner_info.job_batch_id,
10618                #         log_status,
10619                #         log_text,
10620                #     )
10621
10622                logger.info(
10623                    _m(
10624                        f"Got GPU specs: {gpu_model} with max score: {max_score}",
10625                        extra=get_extra_info(default_extra),
10626                    ),
10627                )
10628
10629                # check duplicated
10630                is_duplicated = await self.redis_service.is_elem_exists_in_set(
10631                    DUPLICATED_MACHINE_SET, f"{miner_info.miner_hotkey}:{executor_info.uuid}"
10632                )
10633                if is_duplicated:
10634                    log_status = "warning"
10635                    log_text = _m(
10636                        f"Executor is duplicated",
10637                        extra=get_extra_info(default_extra),
10638                    )
10639                    logger.warning(log_text)
10640
10641                    await self.clear_remote_directory(ssh_client, remote_dir)
10642                    await self.clear_verified_job_count(executor_info)
10643
10644                    return (
10645                        machine_spec,
10646                        executor_info,
10647                        0,
10648                        0,
10649                        miner_info.job_batch_id,
10650                        log_status,
10651                        log_text,
10652                    )
10653
10654                # check rented status
10655                is_rented = await self.redis_service.is_elem_exists_in_set(
10656                    RENTED_MACHINE_SET, f"{miner_info.miner_hotkey}:{executor_info.uuid}"
10657                )
10658                if is_rented:
10659                    score = max_score * gpu_count
10660                    log_text = _m(
10661                        "Executor is already rented.",
10662                        extra=get_extra_info({**default_extra, "score": score}),
10663                    )
10664                    log_status = "info"
10665                    logger.info(log_text)
10666
10667                    await self.clear_remote_directory(ssh_client, remote_dir)
10668
10669                    return (
10670                        machine_spec,
10671                        executor_info,
10672                        score,
10673                        0,
10674                        miner_info.job_batch_id,
10675                        log_status,
10676                        log_text,
10677                    )
10678                else:
10679                    # check gpu usages
10680                    for detail in gpu_details:
10681                        gpu_utilization = detail.get("gpu_utilization", GPU_UTILIZATION_LIMIT)
10682                        gpu_memory_utilization = detail.get("memory_utilization", GPU_MEMORY_UTILIZATION_LIMIT)
10683                        if gpu_utilization >= GPU_UTILIZATION_LIMIT or gpu_memory_utilization > GPU_MEMORY_UTILIZATION_LIMIT:
10684                            log_status = "warning"
10685                            log_text = _m(
10686                                f"High gpu utilization detected:",
10687                                extra=get_extra_info({
10688                                    **default_extra,
10689                                    "gpu_utilization": gpu_utilization,
10690                                    "gpu_memory_utilization": gpu_memory_utilization,
10691                                }),
10692                            )
10693                            logger.warning(log_text)
10694
10695                            await self.clear_remote_directory(ssh_client, remote_dir)
10696                            await self.clear_verified_job_count(executor_info)
10697
10698                            return (
10699                                machine_spec,
10700                                executor_info,
10701                                0,
10702                                0,
10703                                miner_info.job_batch_id,
10704                                log_status,
10705                                log_text,
10706                            )
10707
10708                    # if not rented, check renting ports
10709                    success, log_text, log_status = await self.docker_connection_check(
10710                        ssh_client=ssh_client,
10711                        job_batch_id=miner_info.job_batch_id,
10712                        miner_hotkey=miner_info.miner_hotkey,
10713                        executor_info=executor_info,
10714                        private_key=private_key,
10715                        public_key=public_key,
10716                    )
10717                    if not success:
10718                        await self.clear_remote_directory(ssh_client, remote_dir)
10719                        await self.clear_verified_job_count(executor_info)
10720
10721                        return (
10722                            None,
10723                            executor_info,
10724                            0,
10725                            0,
10726                            miner_info.job_batch_id,
10727                            log_status,
10728                            log_text,
10729                        )
10730
10731                    # if not rented, check docker digests
10732                    docker_digests = machine_spec.get("docker", {}).get("containers", [])
10733                    is_docker_valid = self.validate_digests(docker_digests, docker_hub_digests)
10734                    if not is_docker_valid:
10735                        log_text = _m(
10736                            "Docker digests are not valid",
10737                            extra=get_extra_info(
10738                                {**default_extra, "docker_digests": docker_digests}
10739                            ),
10740                        )
10741                        log_status = "error"
10742
10743                        logger.warning(log_text)
10744
10745                        await self.clear_remote_directory(ssh_client, remote_dir)
10746                        await self.clear_verified_job_count(executor_info)
10747
10748                        return (
10749                            None,
10750                            executor_info,
10751                            0,
10752                            0,
10753                            miner_info.job_batch_id,
10754                            log_status,
10755                            log_text,
10756                        )
10757
10758                # scoring
10759                hashcat_config = HASHCAT_CONFIGS[gpu_model]
10760                if not hashcat_config:
10761                    log_text = _m(
10762                        "No config for hashcat",
10763                        extra=get_extra_info(default_extra),
10764                    )
10765                    log_status = "error"
10766
10767                    logger.warning(log_text)
10768
10769                    await self.clear_remote_directory(ssh_client, remote_dir)
10770                    await self.clear_verified_job_count(executor_info)
10771
10772                    return (
10773                        None,
10774                        executor_info,
10775                        0,
10776                        0,
10777                        miner_info.job_batch_id,
10778                        log_status,
10779                        log_text,
10780                    )
10781
10782                num_digits = hashcat_config.get("digits", 11)
10783                avg_job_time = (
10784                    hashcat_config.get("average_time")[gpu_count - 1 if gpu_count <= 8 else 7]
10785                    if hashcat_config.get("average_time")
10786                    else 60
10787                )
10788                hash_service = HashService.generate(
10789                    gpu_count=gpu_count, num_digits=num_digits, timeout=int(avg_job_time * 2.5)
10790                )
10791                start_time = time.time()
10792
10793                results, err = await self._run_task(
10794                    ssh_client=ssh_client,
10795                    miner_hotkey=miner_info.miner_hotkey,
10796                    executor_info=executor_info,
10797                    command=f"export PYTHONPATH={executor_info.root_dir}:$PYTHONPATH && {executor_info.python_path} {remote_score_file_path} '{hash_service.payload}'",
10798                )
10799                if not results:
10800                    log_text = _m(
10801                        "No result from training job task.",
10802                        extra=get_extra_info({
10803                            **default_extra,
10804                            "error": str(err)
10805                        }),
10806                    )
10807                    log_status = "warning"
10808                    logger.warning(log_text)
10809
10810                    await self.clear_remote_directory(ssh_client, remote_dir)
10811                    await self.clear_verified_job_count(executor_info)
10812
10813                    return (
10814                        machine_spec,
10815                        executor_info,
10816                        0,
10817                        0,
10818                        miner_info.job_batch_id,
10819                        log_status,
10820                        log_text,
10821                    )
10822
10823                end_time = time.time()
10824                job_taken_time = end_time - start_time
10825
10826                result = json.loads(results[0])
10827                answer = result["answer"]
10828
10829                score = 0
10830
10831                logger.info(
10832                    _m(
10833                        f"Results from training job task: {str(result)}",
10834                        extra=get_extra_info(default_extra),
10835                    ),
10836                )
10837                log_text = ""
10838                log_status = ""
10839
10840                if err is not None:
10841                    log_status = "error"
10842                    log_text = _m(
10843                        f"Error executing task on executor: {err}",
10844                        extra=get_extra_info(default_extra),
10845                    )
10846                    logger.error(log_text)
10847
10848                    await self.clear_remote_directory(ssh_client, remote_dir)
10849                    await self.clear_verified_job_count(executor_info)
10850
10851                    return (
10852                        machine_spec,
10853                        executor_info,
10854                        0,
10855                        0,
10856                        miner_info.job_batch_id,
10857                        log_status,
10858                        log_text,
10859                    )
10860
10861                elif answer != hash_service.answer:
10862                    log_status = "error"
10863                    log_text = _m(
10864                        "Hashcat incorrect Answer",
10865                        extra=get_extra_info({**default_extra, "answer": answer, "hash_service_answer": hash_service.answer}),
10866                    )
10867                    logger.error(log_text)
10868
10869                    await self.clear_remote_directory(ssh_client, remote_dir)
10870                    await self.clear_verified_job_count(executor_info)
10871
10872                    return (
10873                        machine_spec,
10874                        executor_info,
10875                        0,
10876                        0,
10877                        miner_info.job_batch_id,
10878                        log_status,
10879                        log_text,
10880                    )
10881
10882                # elif job_taken_time > avg_job_time * 2:
10883                #     log_status = "error"
10884                #     log_text = _m(
10885                #         f"Incorrect Answer",
10886                #         extra=get_extra_info(default_extra),
10887                #     )
10888                #     logger.error(log_text)
10889
10890                else:
10891                    verified_job_count = await self.redis_service.get_verified_job_count(executor_info.uuid)
10892                    verified_job_count += 1
10893
10894                    logger.info(
10895                        _m(
10896                            "Job taken time for executor",
10897                            extra=get_extra_info({
10898                                **default_extra,
10899                                "job_taken_time": job_taken_time,
10900                                "verified_job_count": verified_job_count,
10901                            }),
10902                        ),
10903                    )
10904
10905                    upload_speed = machine_spec.get("network", {}).get("upload_speed", 0)
10906                    download_speed = machine_spec.get("network", {}).get("download_speed", 0)
10907
10908                    # Ensure upload_speed and download_speed are not None
10909                    upload_speed = upload_speed if upload_speed is not None else 0
10910                    download_speed = download_speed if download_speed is not None else 0
10911
10912                    job_taken_score = (
10913                        min(avg_job_time * 0.7 / job_taken_time, 1) if job_taken_time > 0 else 0
10914                    )
10915                    upload_speed_score = min(upload_speed / MAX_UPLOAD_SPEED, 1)
10916                    download_speed_score = min(download_speed / MAX_DOWNLOAD_SPEED, 1)
10917
10918                    score = (
10919                        max_score
10920                        * gpu_count
10921                        * UNRENTED_MULTIPLIER
10922                        * (
10923                            job_taken_score * JOB_TAKEN_TIME_WEIGHT
10924                            + upload_speed_score * UPLOAD_SPEED_WEIGHT
10925                            + download_speed_score * DOWNLOAD_SPEED_WEIGHT
10926                        )
10927                    )
10928
10929                    log_status = "info"
10930                    log_text = _m(
10931                        "Train task finished",
10932                        extra=get_extra_info(
10933                            {
10934                                **default_extra,
10935                                "job_score": score,
10936                                "acutal_score": score if verified_job_count >= VERIFY_JOB_REQUIRED_COUNT else 0,
10937                                "job_taken_time": job_taken_time,
10938                                "upload_speed": upload_speed,
10939                                "download_speed": download_speed,
10940                                "gpu_model": gpu_model,
10941                                "gpu_count": gpu_count,
10942                                "verified_job_count": verified_job_count,
10943                                "remaining_jobs_before_emission": 0 if verified_job_count >= VERIFY_JOB_REQUIRED_COUNT else VERIFY_JOB_REQUIRED_COUNT - verified_job_count,
10944                                "unrented_multiplier": UNRENTED_MULTIPLIER,
10945                            }
10946                        ),
10947                    )
10948
10949                    logger.info(log_text)
10950
10951                    logger.info(
10952                        _m(
10953                            "SSH connection closed for executor",
10954                            extra=get_extra_info(default_extra),
10955                        ),
10956                    )
10957
10958                    await self.clear_remote_directory(ssh_client, remote_dir)
10959                    await self.redis_service.set_verified_job_count(executor_info.uuid, verified_job_count)
10960
10961                    if verified_job_count >= VERIFY_JOB_REQUIRED_COUNT:
10962                        return (
10963                            machine_spec,
10964                            executor_info,
10965                            score,
10966                            score,
10967                            miner_info.job_batch_id,
10968                            log_status,
10969                            log_text,
10970                        )
10971                    else:
10972                        return (
10973                            machine_spec,
10974                            executor_info,
10975                            0,
10976                            score,
10977                            miner_info.job_batch_id,
10978                            log_status,
10979                            log_text,
10980                        )
10981        except Exception as e:
10982            log_status = "error"
10983            log_text = _m(
10984                "Error creating task for executor",
10985                extra=get_extra_info({**default_extra, "error": str(e)}),
10986            )
10987
10988            try:
10989                await self.clear_verified_job_count(executor_info)
10990
10991                key = f"{AVAILABLE_PORT_MAPS_PREFIX}:{miner_info.miner_hotkey}:{executor_info.uuid}"
10992                await self.redis_service.delete(key)
10993            except Exception as redis_error:
10994                log_text = _m(
10995                    "Error creating task redis_reset_error",
10996                    extra=get_extra_info(
10997                        {
10998                            **default_extra,
10999                            "error": str(e),
11000                            "redis_reset_error": str(redis_error),
11001                        }
11002                    ),
11003                )
11004
11005            logger.error(
11006                log_text,
11007                exc_info=True,
11008            )
11009
11010            return (
11011                None,
11012                executor_info,
11013                0,
11014                0,
11015                miner_info.job_batch_id,
11016                log_status,
11017                log_text,
11018            )
11019
11020    async def _run_task(
11021        self,
11022        ssh_client: asyncssh.SSHClientConnection,
11023        miner_hotkey: str,
11024        executor_info: ExecutorSSHInfo,
11025        command: str,
11026        timeout: int = JOB_LENGTH,
11027    ) -> tuple[list[str] | None, str | None]:
11028        try:
11029            executor_name = f"{executor_info.uuid}_{executor_info.address}_{executor_info.port}"
11030            default_extra = {
11031                "executor_uuid": executor_info.uuid,
11032                "executor_ip_address": executor_info.address,
11033                "executor_port": executor_info.port,
11034                "miner_hotkey": miner_hotkey,
11035                "command": command[:100] + ("..." if len(command) > 100 else ""),
11036                "version": settings.VERSION,
11037            }
11038            context.set(f"[_run_task][{executor_name}]")
11039            logger.info(
11040                _m(
11041                    "Running task for executor",
11042                    extra=default_extra,
11043                ),
11044            )
11045            result = await ssh_client.run(command, timeout=timeout)
11046            results = result.stdout.splitlines()
11047            errors = result.stderr.splitlines()
11048
11049            actual_errors = [error for error in errors if "warnning" not in error.lower()]
11050
11051            if len(results) == 0 and len(actual_errors) > 0:
11052                logger.error(_m("Failed to execute command!", extra=get_extra_info({**default_extra, "errors": actual_errors})))
11053                raise Exception("Failed to execute command!")
11054
11055            return results, None
11056        except Exception as e:
11057            logger.error(
11058                _m("Run task error to executor", extra=get_extra_info(default_extra)),
11059                exc_info=True,
11060            )
11061
11062            return None, str(e)
11063
11064
11065TaskServiceDep = Annotated[TaskService, Depends(TaskService)]
11066
11067
11068
11069---
11070File: /neurons/validators/src/cli.py
11071---
11072
11073import asyncio
11074import logging
11075import random
11076import time
11077import uuid
11078
11079import click
11080from datura.requests.miner_requests import ExecutorSSHInfo
11081
11082from core.utils import configure_logs_of_other_modules
11083from core.validator import Validator
11084from services.ioc import ioc
11085from services.miner_service import MinerService
11086from services.docker_service import DockerService, REPOSITORIES
11087from services.file_encrypt_service import FileEncryptService
11088from payload_models.payloads import (
11089    MinerJobRequestPayload,
11090    ContainerCreateRequest,
11091    CustomOptions,
11092)
11093
11094configure_logs_of_other_modules()
11095logger = logging.getLogger(__name__)
11096
11097
11098@click.group()
11099def cli():
11100    pass
11101
11102
11103@cli.command()
11104@click.option("--miner_hotkey", prompt="Miner Hotkey", help="Hotkey of Miner")
11105@click.option("--miner_address", prompt="Miner Address", help="Miner IP Address")
11106@click.option("--miner_port", type=int, prompt="Miner Port", help="Miner Port")
11107def debug_send_job_to_miner(miner_hotkey: str, miner_address: str, miner_port: int):
11108    """Debug sending job to miner"""
11109    miner = type("Miner", (object,), {})()
11110    miner.hotkey = miner_hotkey
11111    miner.axon_info = type("AxonInfo", (object,), {})()
11112    miner.axon_info.ip = miner_address
11113    miner.axon_info.port = miner_port
11114    validator = Validator(debug_miner=miner)
11115    asyncio.run(validator.sync())
11116
11117
11118def generate_random_ip():
11119    return ".".join(str(random.randint(0, 255)) for _ in range(4))
11120
11121
11122@cli.command()
11123def debug_send_machine_specs_to_connector():
11124    """Debug sending machine specs to connector"""
11125    miner_service: MinerService = ioc["MinerService"]
11126    counter = 0
11127
11128    while counter < 10:
11129        counter += 1
11130        debug_specs = {
11131            "gpu": {
11132                "count": 1,
11133                "details": [
11134                    {
11135                        "name": "NVIDIA RTX A5000",
11136                        "driver": "555.42.06",
11137                        "capacity": "24564",
11138                        "cuda": "8.6",
11139                        "power_limit": "230.00",
11140                        "graphics_speed": "435",
11141                        "memory_speed": "5000",
11142                        "pcei": "16",
11143                    }
11144                ],
11145            },
11146            "cpu": {"count": 128, "model": "AMD EPYC 7452 32-Core Processor", "clocks": []},
11147            "ram": {
11148                "available": 491930408,
11149                "free": 131653212,
11150                "total": 528012784,
11151                "used": 396359572,
11152            },
11153            "hard_disk": {"total": 20971520, "used": 13962880, "free": 7008640},
11154            "os": "Ubuntu 22.04.4 LTS",
11155        }
11156        asyncio.run(
11157            miner_service.publish_machine_specs(
11158                results=[
11159                    (
11160                        debug_specs,
11161                        ExecutorSSHInfo(
11162                            uuid=str(uuid.uuid4()),
11163                            address=generate_random_ip(),
11164                            port="8001",
11165                            ssh_username="test",
11166                            ssh_port=22,
11167                            python_path="test",
11168                            root_dir="test",
11169                        ),
11170                    )
11171                ],
11172                miner_hotkey="5Cco1xUS8kXuaCzAHAXZ36nr6mLzmY5B9ufxrfb8Q3HB6ZdN",
11173            )
11174        )
11175
11176        asyncio.run(
11177            miner_service.publish_machine_specs(
11178                results=[
11179                    (
11180                        debug_specs,
11181                        ExecutorSSHInfo(
11182                            uuid=str(uuid.uuid4()),
11183                            address=generate_random_ip(),
11184                            port="8001",
11185                            ssh_username="test",
11186                            ssh_port=22,
11187                            python_path="test",
11188                            root_dir="test",
11189                        ),
11190                    )
11191                ],
11192                miner_hotkey="5Cco1xUS8kXuaCzAHAXZ36nr6mLzmY5B9ufxrfb8Q3HB6ZdN",
11193            )
11194        )
11195
11196        time.sleep(2)
11197
11198
11199@cli.command()
11200def debug_set_weights():
11201    """Debug setting weights"""
11202    validator = Validator()
11203    subtensor = validator.get_subtensor()
11204    # fetch miners
11205    miners = validator.fetch_miners(subtensor)
11206    asyncio.run(validator.set_weights(miners=miners, subtensor=subtensor))
11207
11208
11209@cli.command()
11210@click.option("--miner_hotkey", prompt="Miner Hotkey", help="Hotkey of Miner")
11211@click.option("--miner_address", prompt="Miner Address", help="Miner IP Address")
11212@click.option("--miner_port", type=int, prompt="Miner Port", help="Miner Port")
11213def request_job_to_miner(miner_hotkey: str, miner_address: str, miner_port: int):
11214    asyncio.run(_request_job_to_miner(miner_hotkey, miner_address, miner_port))
11215
11216
11217async def _request_job_to_miner(miner_hotkey: str, miner_address: str, miner_port: int):
11218    miner_service: MinerService = ioc["MinerService"]
11219    docker_service: DockerService = ioc["DockerService"]
11220    file_encrypt_service: FileEncryptService = ioc["FileEncryptService"]
11221
11222    docker_hub_digests = await docker_service.get_docker_hub_digests(REPOSITORIES)
11223    encypted_files = file_encrypt_service.ecrypt_miner_job_files()
11224
11225    await miner_service.request_job_to_miner(
11226        MinerJobRequestPayload(
11227            job_batch_id='job_batch_id',
11228            miner_hotkey=miner_hotkey,
11229            miner_address=miner_address,
11230            miner_port=miner_port,
11231        ),
11232        encypted_files=encypted_files,
11233        docker_hub_digests=docker_hub_digests,
11234    )
11235
11236@cli.command()
11237@click.option("--miner_hotkey", prompt="Miner Hotkey", help="Hotkey of Miner")
11238@click.option("--miner_address", prompt="Miner Address", help="Miner IP Address")
11239@click.option("--miner_port", type=int, prompt="Miner Port", help="Miner Port")
11240@click.option("--executor_id", prompt="Executor Id", help="Executor Id")
11241@click.option("--docker_image", prompt="Docker Image", help="Docker Image")
11242def create_container_to_miner(miner_hotkey: str, miner_address: str, miner_port: int, executor_id: str, docker_image: str):
11243    asyncio.run(_create_container_to_miner(miner_hotkey, miner_address, miner_port, executor_id, docker_image))
11244
11245
11246async def _create_container_to_miner(miner_hotkey: str, miner_address: str, miner_port: int, executor_id: str, docker_image: str):
11247    miner_service: MinerService = ioc["MinerService"]
11248
11249    payload = ContainerCreateRequest(
11250        docker_image=docker_image,
11251        user_public_key="user_public_key",
11252        executor_id=executor_id,
11253        miner_hotkey=miner_hotkey,
11254        miner_address=miner_address,
11255        miner_port=miner_port,
11256    )
11257    await miner_service.handle_container(payload)
11258
11259@cli.command()
11260@click.option("--miner_hotkey", prompt="Miner Hotkey", help="Hotkey of Miner")
11261@click.option("--miner_address", prompt="Miner Address", help="Miner IP Address")
11262@click.option("--miner_port", type=int, prompt="Miner Port", help="Miner Port")
11263@click.option("--executor_id", prompt="Executor Id", help="Executor Id")
11264@click.option("--docker_image", prompt="Docker Image", help="Docker Image")
11265def create_custom_container_to_miner(miner_hotkey: str, miner_address: str, miner_port: int, executor_id: str, docker_image: str):
11266    asyncio.run(_create_custom_container_to_miner(miner_hotkey, miner_address, miner_port, executor_id, docker_image))
11267
11268
11269async def _create_custom_container_to_miner(miner_hotkey: str, miner_address: str, miner_port: int, executor_id: str, docker_image: str):
11270    miner_service: MinerService = ioc["MinerService"]
11271    # mock custom options
11272    custom_options = CustomOptions(
11273        volumes=["/var/runer/docker.sock:/var/runer/docker.sock"],
11274        environment={"UPDATED_PUBLIC_KEY":"user_public_key"},
11275        entrypoint="",
11276        internal_ports=[22, 8002],
11277        startup_commands="/bin/bash -c 'apt-get update && apt-get install -y ffmpeg && pip install opencv-python'",
11278    )
11279    payload = ContainerCreateRequest(
11280        docker_image=docker_image,
11281        user_public_key="user_public_key",
11282        executor_id=executor_id,
11283        miner_hotkey=miner_hotkey,
11284        miner_address=miner_address,
11285        miner_port=miner_port,
11286        custom_options=custom_options
11287    )
11288    await miner_service.handle_container(payload)
11289
11290if __name__ == "__main__":
11291    cli()
11292
11293
11294
11295---
11296File: /neurons/validators/src/connector.py
11297---
11298
11299import asyncio
11300import time
11301
11302from clients.compute_client import ComputeClient
11303
11304from core.config import settings
11305from core.utils import get_logger, wait_for_services_sync
11306from services.ioc import ioc
11307
11308logger = get_logger(__name__)
11309wait_for_services_sync()
11310
11311
11312async def run_forever():
11313    logger.info("Compute app connector started")
11314    keypair = settings.get_bittensor_wallet().get_hotkey()
11315    compute_app_client = ComputeClient(
11316        keypair, f"{settings.COMPUTE_APP_URI}/validator/{keypair.ss58_address}", ioc["MinerService"]
11317    )
11318    async with compute_app_client:
11319        await compute_app_client.run_forever()
11320
11321
11322def start_process():
11323    while True:
11324        try:
11325            loop = asyncio.new_event_loop()
11326            asyncio.set_event_loop(loop)
11327            loop.run_until_complete(run_forever())
11328        except Exception as e:
11329            logger.error(f"Compute app connector crashed: {e}", exc_info=True)
11330            time.sleep(1)
11331
11332
11333if __name__ == "__main__":
11334    start_process()
11335
11336# def start_connector_process():
11337#     p = multiprocessing.Process(target=start_process)
11338#     p.start()
11339#     return p
11340
11341
11342
11343---
11344File: /neurons/validators/src/job.py
11345---
11346
11347import time
11348import random
11349
11350start_time = time.time()
11351
11352wait_time = random.uniform(10, 30)
11353time.sleep(wait_time)
11354
11355# print("Job finished")
11356print(time.time() - start_time)
11357
11358
11359---
11360File: /neurons/validators/src/test_validator.py
11361---
11362
11363import asyncio
11364import bittensor
11365
11366from core.config import settings
11367from fastapi.testclient import TestClient
11368from concurrent.futures import ThreadPoolExecutor, as_completed
11369from services.docker_service import DockerService
11370from services.ioc import ioc
11371
11372from validator import app
11373
11374client = TestClient(app)
11375
11376
11377def send_post_request():
11378    response = client.post(
11379        "/miner_request",
11380        json={
11381            "miner_hotkey": "5EHgHZBfx4ZwU7GzGCS8VCMBLBEKo5eaCvXKiu6SASwWT6UY",
11382            "miner_address": "localhost",
11383            "miner_port": 8000
11384        },
11385    )
11386    assert response.status_code == 200
11387
11388
11389def test_socket_connections():
11390    num_requests = 10  # Number of simultaneous requests
11391    with ThreadPoolExecutor(max_workers=num_requests) as executor:
11392        futures = [executor.submit(send_post_request) for _ in range(num_requests)]
11393
11394        for future in as_completed(futures):
11395            response = future.result()
11396            assert response.status_code == 200
11397
11398
11399async def check_docker_port_mappings():
11400    docker_service: DockerService = ioc["DockerService"]
11401    miner_hotkey = '5Df8qGLMd19BXByefGCZFN57fWv6jDm5hUbnQeUTu2iqNBhT'
11402    executor_id = 'c272060f-8eae-4265-8e26-1d83ac96b498'
11403    port_mappings = await docker_service.generate_portMappings(miner_hotkey, executor_id)
11404    print('port_mappings ==>', port_mappings)
11405
11406if __name__ == "__main__":
11407    # test_socket_connections()
11408    asyncio.run(check_docker_port_mappings())
11409
11410    config = settings.get_bittensor_config()
11411    subtensor = bittensor.subtensor(config=config)
11412    node = subtensor.substrate
11413
11414    netuid = settings.BITTENSOR_NETUID
11415    tempo = subtensor.tempo(netuid)
11416    weights_rate_limit = node.query("SubtensorModule", "WeightsSetRateLimit", [netuid]).value
11417    server_rate_limit = node.query("SubtensorModule", "WeightsSetRateLimit", [netuid]).value
11418    serving_rate_limit = node.query("SubtensorModule", "ServingRateLimit", [netuid]).value
11419    print('rate limit ===>', tempo, weights_rate_limit, serving_rate_limit)
11420
11421
11422
11423---
11424File: /neurons/validators/src/validator.py
11425---
11426
11427import asyncio
11428import logging
11429
11430import uvicorn
11431from fastapi import FastAPI
11432
11433from core.config import settings
11434from core.utils import configure_logs_of_other_modules, wait_for_services_sync
11435from core.validator import Validator
11436
11437configure_logs_of_other_modules()
11438wait_for_services_sync()
11439
11440
11441async def app_lifespan(app: FastAPI):
11442    if settings.DEBUG:
11443        validator = Validator(debug_miner=settings.get_debug_miner())
11444    else:
11445        validator = Validator()
11446    # Run the miner in the background
11447    task = asyncio.create_task(validator.start())
11448
11449    try:
11450        yield
11451    finally:
11452        await validator.stop()  # Ensure proper cleanup
11453        await task  # Wait for the background task to complete
11454        logging.info("Validator exited successfully.")
11455
11456
11457app = FastAPI(
11458    title=settings.PROJECT_NAME,
11459    lifespan=app_lifespan,
11460)
11461
11462# app.include_router(apis_router)
11463
11464reload = True if settings.ENV == "dev" else False
11465
11466if __name__ == "__main__":
11467    uvicorn.run("validator:app", host="0.0.0.0", port=settings.INTERNAL_PORT, reload=reload)
11468
11469
11470
11471---
11472File: /neurons/validators/tests/__init__.py
11473---
11474
11475
11476
11477
11478---
11479File: /neurons/validators/docker_build.sh
11480---
11481
11482#!/bin/bash
11483set -eux -o pipefail
11484
11485IMAGE_NAME="daturaai/compute-subnet-validator:$TAG"
11486
11487docker build --build-context datura=../../datura -t $IMAGE_NAME .
11488
11489
11490---
11491File: /neurons/validators/docker_publish.sh
11492---
11493
11494#!/bin/bash
11495set -eux -o pipefail
11496
11497source ./docker_build.sh
11498
11499echo "$DOCKERHUB_PAT" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
11500docker push "$IMAGE_NAME"
11501
11502
11503---
11504File: /neurons/validators/docker_runner_build.sh
11505---
11506
11507#!/bin/bash
11508set -eux -o pipefail
11509
11510IMAGE_NAME="daturaai/compute-subnet-validator-runner:$TAG"
11511
11512docker build --file Dockerfile.runner -t $IMAGE_NAME .
11513
11514
11515---
11516File: /neurons/validators/docker_runner_publish.sh
11517---
11518
11519#!/bin/bash
11520set -eux -o pipefail
11521
11522source ./docker_runner_build.sh
11523
11524echo "$DOCKERHUB_PAT" | docker login -u "$DOCKERHUB_USERNAME" --password-stdin
11525docker push "$IMAGE_NAME"
11526
11527
11528---
11529File: /neurons/validators/entrypoint.sh
11530---
11531
11532#!/bin/sh
11533set -eu
11534
11535docker compose up --pull always --detach --wait --force-recreate
11536
11537# Clean docker images
11538docker image prune -f
11539
11540# Remove all Docker images with a name but no tag
11541# docker images --filter "dangling=false" --format "{{.Repository}}:{{.Tag}} {{.ID}}" | grep ":<none>" | awk '{print $2}' | xargs -r docker rmi
11542
11543while true
11544do
11545    docker compose logs -f
11546    echo 'All containers died'
11547    sleep 10
11548done
11549
11550
11551
11552---
11553File: /neurons/validators/README.md
11554---
11555
11556# Validator
11557
11558## System Requirements
11559
11560For validation, a validator machine will need:
11561
11562- **CPU**: 4 cores
11563- **RAM**: 8 GB
11564
11565Ensure that your machine meets these requirements before proceeding with the setup.
11566
11567---
11568
11569First, register and regen your bittensor wallet and validator hotkey onto the machine. 
11570
11571For installation of btcli, check [this guide](https://github.com/opentensor/bittensor/blob/master/README.md#install-bittensor-sdk)
11572```
11573btcli s register --netuid 51
11574```
11575```
11576btcli w regen_coldkeypub
11577```
11578```
11579btcli w regen_hotkey
11580```
11581
11582## Installation
11583
11584### Using Docker
11585
11586#### Step 1: Clone Git repo
11587
11588```
11589git clone https://github.com/Datura-ai/compute-subnet.git
11590```
11591
11592#### Step 2: Install Required Tools
11593
11594```
11595cd compute-subnet && chmod +x scripts/install_validator_on_ubuntu.sh && ./scripts/install_validator_on_ubuntu.sh
11596```
11597
11598Verify docker installation
11599
11600```
11601docker --version
11602```
11603If did not correctly install, follow [this link](https://docs.docker.com/engine/install/)
11604
11605#### Step 3: Setup ENV
11606```
11607cp neurons/validators/.env.template neurons/validators/.env
11608```
11609
11610Replace with your information for `BITTENSOR_WALLET_NAME`, `BITTENSOR_WALLET_HOTKEY_NAME`, `HOST_WALLET_DIR`.
11611If you want you can use different port for `INTERNAL_PORT`, `EXTERNAL_PORT`.
11612
11613#### Step 4: Docker Compose Up
11614
11615```
11616cd neurons/validators && docker compose up -d
11617```
11618
11619
11620
11621---
11622File: /neurons/validators/run.sh
11623---
11624
11625#!/bin/sh
11626
11627# db migrate
11628alembic upgrade head
11629
11630# run fastapi app
11631python src/validator.py
11632
11633
11634---
11635File: /neurons/__init__.py
11636---
11637
11638
11639
11640
11641---
11642File: /scripts/check_compatibility.sh
11643---
11644
11645  #!/bin/bash
11646
11647if [ -z "$1" ]; then
11648    echo "Please provide a Python version as an argument."
11649    exit 1
11650fi
11651
11652python_version="$1"
11653all_passed=true
11654
11655GREEN='\033[0;32m'
11656YELLOW='\033[0;33m'
11657RED='\033[0;31m'
11658NC='\033[0m' # No Color
11659
11660check_compatibility() {
11661    all_supported=0
11662
11663    while read -r requirement; do
11664        # Skip lines starting with git+
11665        if [[ "$requirement" == git+* ]]; then
11666            continue
11667        fi
11668
11669        package_name=$(echo "$requirement" | awk -F'[!=<>]' '{print $1}' | awk -F'[' '{print $1}') # Strip off brackets
11670        echo -n "Checking $package_name... "
11671
11672        url="https://pypi.org/pypi/$package_name/json"
11673        response=$(curl -s $url)
11674        status_code=$(curl -s -o /dev/null -w "%{http_code}" $url)
11675
11676        if [ "$status_code" != "200" ]; then
11677            echo -e "${RED}Information not available for $package_name. Failure.${NC}"
11678            all_supported=1
11679            continue
11680        fi
11681
11682        classifiers=$(echo "$response" | jq -r '.info.classifiers[]')
11683        requires_python=$(echo "$response" | jq -r '.info.requires_python')
11684
11685        base_version="Programming Language :: Python :: ${python_version%%.*}"
11686        specific_version="Programming Language :: Python :: $python_version"
11687
11688        if echo "$classifiers" | grep -q "$specific_version" || echo "$classifiers" | grep -q "$base_version"; then
11689            echo -e "${GREEN}Supported${NC}"
11690        elif [ "$requires_python" != "null" ]; then
11691            if echo "$requires_python" | grep -Eq "==$python_version|>=$python_version|<=$python_version"; then
11692                echo -e "${GREEN}Supported${NC}"
11693            else
11694                echo -e "${RED}Not compatible with Python $python_version due to constraint $requires_python.${NC}"
11695                all_supported=1
11696            fi
11697        else
11698            echo -e "${YELLOW}Warning: Specific version not listed, assuming compatibility${NC}"
11699        fi
11700    done < requirements.txt
11701
11702    return $all_supported
11703}
11704
11705echo "Checking compatibility for Python $python_version..."
11706check_compatibility
11707if [ $? -eq 0 ]; then
11708    echo -e "${GREEN}All requirements are compatible with Python $python_version.${NC}"
11709else
11710    echo -e "${RED}All requirements are NOT compatible with Python $python_version.${NC}"
11711    all_passed=false
11712fi
11713
11714echo ""
11715if $all_passed; then
11716    echo -e "${GREEN}All tests passed.${NC}"
11717else
11718    echo -e "${RED}All tests did not pass.${NC}"
11719    exit 1
11720fi
11721
11722
11723
11724---
11725File: /scripts/check_requirements_changes.sh
11726---
11727
11728#!/bin/bash
11729
11730# Check if requirements files have changed in the last commit
11731if git diff --name-only HEAD~1 | grep -E 'requirements.txt|requirements.txt'; then
11732    echo "Requirements files have changed. Running compatibility checks..."
11733    echo 'export REQUIREMENTS_CHANGED="true"' >> $BASH_ENV
11734else
11735    echo "Requirements files have not changed. Skipping compatibility checks..."
11736    echo 'export REQUIREMENTS_CHANGED="false"' >> $BASH_ENV
11737fi
11738
11739
11740
11741---
11742File: /scripts/install_dev.sh
11743---
11744
11745#!/bin/bash
11746
11747set -u
11748
11749# enable command completion
11750set -o history -o histexpand
11751
11752abort() {
11753  printf "%s\n" "$1"
11754  exit 1
11755}
11756
11757getc() {
11758  local save_state
11759  save_state=$(/bin/stty -g)
11760  /bin/stty raw -echo
11761  IFS= read -r -n 1 -d '' "$@"
11762  /bin/stty "$save_state"
11763}
11764
11765exit_on_error() {
11766    exit_code=$1
11767    last_command=${@:2}
11768    if [ $exit_code -ne 0 ]; then
11769        >&2 echo "\"${last_command}\" command failed with exit code ${exit_code}."
11770        exit $exit_code
11771    fi
11772}
11773
11774shell_join() {
11775  local arg
11776  printf "%s" "$1"
11777  shift
11778  for arg in "$@"; do
11779    printf " "
11780    printf "%s" "${arg// /\ }"
11781  done
11782}
11783
11784# string formatters
11785if [[ -t 1 ]]; then
11786  tty_escape() { printf "\033[%sm" "$1"; }
11787else
11788  tty_escape() { :; }
11789fi
11790tty_mkbold() { tty_escape "1;$1"; }
11791tty_underline="$(tty_escape "4;39")"
11792tty_blue="$(tty_mkbold 34)"
11793tty_red="$(tty_mkbold 31)"
11794tty_bold="$(tty_mkbold 39)"
11795tty_reset="$(tty_escape 0)"
11796
11797ohai() {
11798  printf "${tty_blue}==>${tty_bold} %s${tty_reset}\n" "$(shell_join "$@")"
11799}
11800
11801wait_for_user() {
11802  local c
11803  echo
11804  echo "Press RETURN to continue or any other key to abort"
11805  getc c
11806  # we test for \r and \n because some stuff does \r instead
11807  if ! [[ "$c" == $'\r' || "$c" == $'\n' ]]; then
11808    exit 1
11809  fi
11810}
11811
11812#install pre
11813install_pre() {
11814    sudo apt update
11815    sudo apt install --no-install-recommends --no-install-suggests -y sudo apt-utils curl git cmake build-essential
11816    exit_on_error $?
11817}
11818
11819# check if python is installed, if not install it
11820install_python() {
11821    # Check if python3.11 is installed
11822    if command -v python3.11 &> /dev/null
11823    then
11824        # Check the version
11825        PYTHON_VERSION=$(python3.11 --version 2>&1)
11826        if [[ $PYTHON_VERSION == *"Python 3.11"* ]]; then
11827            ohai "Python 3.11 is already installed."
11828        else
11829            ohai "Linking python to python 3.11"
11830            sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
11831            python -m pip install cffi
11832            python -m pip install cryptography
11833        fi
11834    else
11835        ohai "Installing Python 3.11"
11836        add-apt-repository ppa:deadsnakes/ppa
11837        apt install python3.11
11838        sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
11839        python -m pip install cffi
11840        python -m pip install cryptography
11841    fi
11842
11843    # check if PDM is installed
11844    if command -v pdm &> /dev/null
11845    then
11846        ohai "PDM is already installed."
11847        echo "Checking PDM version..."
11848        pdm --version
11849    else
11850        ohai "Installing PDM..."
11851        curl -sSL https://pdm-project.org/install-pdm.py | python3 -
11852
11853        local bashrc_file="/root/.bashrc"
11854        local path_string="export PATH=/root/.local/bin:\$PATH"
11855
11856        if ! grep -Fxq "$path_string" $bashrc_file; then
11857            echo "$path_string" >> $bashrc_file
11858            echo "Added $path_string to $bashrc_file"
11859        else
11860            echo "$path_string already present in $bashrc_file"
11861        fi
11862
11863        export PATH=/root/.local/bin:$PATH
11864
11865        echo "Checking PDM version..."
11866        pdm --version
11867    fi
11868
11869    PROJECT_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)
11870    PROJECT_DIR=${PROJECT_DIR}/../
11871    cd ${PROJECT_DIR}
11872
11873    ohai "Installing PDM packages in root folder."
11874    pdm install -d
11875
11876    ohai "Installing pre-commit for the project."
11877    pdm run pre-commit install
11878}
11879
11880
11881
11882ohai "This script will install:"
11883echo "git"
11884echo "curl"
11885echo "python3.11 and pdm"
11886echo "python3-pip"
11887echo "pre-commit with ruff"
11888
11889wait_for_user
11890install_pre
11891install_python
11892
11893
11894---
11895File: /scripts/install_executor_on_ubuntu.sh
11896---
11897
11898#!/bin/bash
11899set -u
11900
11901# enable command completion
11902set -o history -o histexpand
11903
11904abort() {
11905  printf "%s\n" "$1"
11906  exit 1
11907}
11908
11909getc() {
11910  local save_state
11911  save_state=$(/bin/stty -g)
11912  /bin/stty raw -echo
11913  IFS= read -r -n 1 -d '' "$@"
11914  /bin/stty "$save_state"
11915}
11916
11917exit_on_error() {
11918    exit_code=$1
11919    last_command=${@:2}
11920    if [ $exit_code -ne 0 ]; then
11921        >&2 echo "\"${last_command}\" command failed with exit code ${exit_code}."
11922        exit $exit_code
11923    fi
11924}
11925
11926shell_join() {
11927  local arg
11928  printf "%s" "$1"
11929  shift
11930  for arg in "$@"; do
11931    printf " "
11932    printf "%s" "${arg// /\ }"
11933  done
11934}
11935
11936# string formatters
11937if [[ -t 1 ]]; then
11938  tty_escape() { printf "\033[%sm" "$1"; }
11939else
11940  tty_escape() { :; }
11941fi
11942tty_mkbold() { tty_escape "1;$1"; }
11943tty_underline="$(tty_escape "4;39")"
11944tty_blue="$(tty_mkbold 34)"
11945tty_red="$(tty_mkbold 31)"
11946tty_bold="$(tty_mkbold 39)"
11947tty_reset="$(tty_escape 0)"
11948
11949ohai() {
11950  printf "${tty_blue}==>${tty_bold} %s${tty_reset}\n" "$(shell_join "$@")"
11951}
11952
11953wait_for_user() {
11954  local c
11955  echo
11956  echo "Press RETURN to continue or any other key to abort"
11957  getc c
11958  # we test for \r and \n because some stuff does \r instead
11959  if ! [[ "$c" == $'\r' || "$c" == $'\n' ]]; then
11960    exit 1
11961  fi
11962}
11963
11964#install pre
11965install_pre() {
11966    sudo apt update
11967    sudo apt install --no-install-recommends --no-install-suggests -y sudo apt-utils curl git cmake build-essential
11968    exit_on_error $?
11969}
11970
11971# check if python is installed, if not install it
11972install_python() {
11973    # Check if python3.11 is installed
11974    if command -v python3.11 &> /dev/null
11975    then
11976        # Check the version
11977        PYTHON_VERSION=$(python3.11 --version 2>&1)
11978        if [[ $PYTHON_VERSION == *"Python 3.11"* ]]; then
11979            ohai "Python 3.11 is already installed."
11980        else
11981            ohai "Linking python to python 3.11"
11982            sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
11983            python -m pip install cffi
11984            python -m pip install cryptography
11985        fi
11986    else
11987        ohai "Installing Python 3.11"
11988        add-apt-repository ppa:deadsnakes/ppa
11989        sudo apt install python3.11
11990        sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
11991        python -m pip install cffi
11992        python -m pip install cryptography
11993    fi
11994
11995    # check if PDM is installed
11996    if command -v pdm &> /dev/null
11997    then
11998        ohai "PDM is already installed."
11999        echo "Checking PDM version..."
12000        pdm --version
12001    else
12002        ohai "Installing PDM..."
12003        sudo apt install -y python3.12-venv
12004        curl -sSL https://pdm-project.org/install-pdm.py | python3 -
12005
12006        local bashrc_file="/root/.bashrc"
12007        local path_string="export PATH=/root/.local/bin:\$PATH"
12008
12009        if ! grep -Fxq "$path_string" $bashrc_file; then
12010            echo "$path_string" >> $bashrc_file
12011            echo "Added $path_string to $bashrc_file"
12012        else
12013            echo "$path_string already present in $bashrc_file"
12014        fi
12015
12016        export PATH=/root/.local/bin:$PATH
12017
12018        echo "Checking PDM version..."
12019        pdm --version
12020    fi
12021}
12022
12023# install redis
12024install_redis() {
12025    if command -v redis-server &> /dev/null
12026    then
12027        ohai "Redis is already installed."
12028        echo "Checking Redis version..."
12029        redis-server --version
12030    else
12031        ohai "Installing Redis..."
12032
12033        sudo apt install -y redis-server
12034
12035        echo "Starting Redis server..."
12036        sudo systemctl start redis-server.service
12037
12038        echo "Checking Redis server status..."
12039        sudo systemctl status redis-server.service
12040    fi
12041}
12042
12043# install postgresql
12044install_postgresql() {
12045    if command -v psql &> /dev/null
12046    then
12047        ohai "PostgreSQL is already installed."
12048        echo "Checking PostgreSQL version..."
12049        psql --version
12050
12051        # Check if the database exists
12052        DB_EXISTS=$(sudo -u postgres psql -tAc "SELECT 1 FROM pg_database WHERE datname='compute_subnet_db'")
12053        if [ "$DB_EXISTS" == "1" ]; then
12054            echo "Database compute_subnet_db already exists."
12055        else
12056            echo "Creating database compute_subnet_db..."
12057            sudo -u postgres createdb compute_subnet_db
12058        fi
12059    else
12060        echo "Installing PostgreSQL..."
12061        sudo apt install -y postgresql postgresql-contrib
12062
12063        echo "Starting PostgreSQL server..."
12064        sudo systemctl start postgresql.service
12065
12066        echo "Setting password for postgres user..."
12067        sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'password';"
12068
12069        echo "Creating database compute_subnet_db..."
12070        sudo -u postgres createdb compute_subnet_db
12071    fi
12072}
12073
12074# install btcli
12075install_btcli() {
12076    if command -v btcli &> /dev/null
12077    then
12078        ohai "BtCLI is already installed."
12079    else
12080        ohai "Installing BtCLI..."
12081
12082        sudo apt install -y pipx 
12083        pipx install bittensor
12084        source ~/.bashrc
12085    fi
12086}
12087
12088# install docker
12089install_docker() {
12090  if command -v docker &> /dev/null; then
12091    ohai "Docker is already installed."
12092    return 0
12093  else
12094    ohai "Installing Docker..."
12095    sudo apt-get update -y
12096    sudo apt-get install -y ca-certificates curl
12097    sudo install -m 0755 -d /etc/apt/keyrings
12098    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
12099    sudo chmod a+r /etc/apt/keyrings/docker.asc
12100    
12101    # Add the repository to Apt sources:
12102    echo \
12103      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
12104      $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
12105      sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
12106    sudo apt-get update -y
12107    sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
12108    sudo groupadd docker
12109    sudo usermod -aG docker $USER
12110    newgrp docker
12111  fi
12112}
12113
12114ohai "This script will install:"
12115echo "docker"
12116
12117
12118wait_for_user
12119install_pre
12120install_docker
12121
12122
12123
12124---
12125File: /scripts/install_miner_on_runpod.sh
12126---
12127
12128#!/bin/bash
12129set -u
12130
12131# enable command completion
12132set -o history -o histexpand
12133
12134abort() {
12135  printf "%s\n" "$1"
12136  exit 1
12137}
12138
12139getc() {
12140  local save_state
12141  save_state=$(/bin/stty -g)
12142  /bin/stty raw -echo
12143  IFS= read -r -n 1 -d '' "$@"
12144  /bin/stty "$save_state"
12145}
12146
12147exit_on_error() {
12148    exit_code=$1
12149    last_command=${@:2}
12150    if [ $exit_code -ne 0 ]; then
12151        >&2 echo "\"${last_command}\" command failed with exit code ${exit_code}."
12152        exit $exit_code
12153    fi
12154}
12155
12156shell_join() {
12157  local arg
12158  printf "%s" "$1"
12159  shift
12160  for arg in "$@"; do
12161    printf " "
12162    printf "%s" "${arg// /\ }"
12163  done
12164}
12165
12166# string formatters
12167if [[ -t 1 ]]; then
12168  tty_escape() { printf "\033[%sm" "$1"; }
12169else
12170  tty_escape() { :; }
12171fi
12172tty_mkbold() { tty_escape "1;$1"; }
12173tty_underline="$(tty_escape "4;39")"
12174tty_blue="$(tty_mkbold 34)"
12175tty_red="$(tty_mkbold 31)"
12176tty_bold="$(tty_mkbold 39)"
12177tty_reset="$(tty_escape 0)"
12178
12179ohai() {
12180  printf "${tty_blue}==>${tty_bold} %s${tty_reset}\n" "$(shell_join "$@")"
12181}
12182
12183wait_for_user() {
12184  local c
12185  echo
12186  echo "Press Enter to continue or any other key to abort"
12187  getc c
12188  # we test for \r and \n because some stuff does \r instead
12189  if ! [[ "$c" == $'\r' || "$c" == $'\n' ]]; then
12190    exit 1
12191  fi
12192}
12193
12194#install pre
12195install_pre() {
12196    apt update
12197    apt upgrade
12198    apt install --no-install-recommends --no-install-suggests -y apt-utils curl git cmake build-essential nano
12199    exit_on_error $?
12200}
12201
12202# check if python is installed, if not install it
12203install_python() {
12204    # Check if python3.11 is installed
12205    if command -v python3.11 &> /dev/null
12206    then
12207        # Check the version
12208        PYTHON_VERSION=$(python3.11 --version 2>&1)
12209        if [[ $PYTHON_VERSION == *"Python 3.11"* ]]; then
12210            echo "Python 3.11 is already installed."
12211        else
12212            echo "Linking python to python 3.11"
12213            update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
12214            
12215            # Ensure pip is installed
12216            python3.11 -m ensurepip --upgrade
12217            
12218            # Install necessary packages
12219            python -m pip install --upgrade pip
12220            pip install cffi
12221            pip install cryptography
12222
12223            # Install bittensor
12224            pip install bittensor
12225            pip install bittensor[torch]
12226        fi
12227    else
12228        ohai "Installing Python 3.11..."
12229        add-apt-repository ppa:deadsnakes/ppa
12230        apt update
12231        apt install -y python3.11 python3.11-venv python3.11-dev
12232        
12233        echo "Linking python to python 3.11"
12234        update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
12235        
12236        # Ensure pip is installed
12237        python3.11 -m ensurepip --upgrade
12238        
12239        # Install necessary packages
12240        python -m pip install --upgrade pip
12241        pip install cffi
12242        pip install cryptography
12243
12244        # Install bittensor
12245        pip install bittensor
12246        pip install bittensor[torch]
12247    fi
12248
12249    # check if PDM is installed
12250    if command -v pdm &> /dev/null
12251    then
12252        ohai "PDM is already installed."
12253        echo "Checking PDM version..."
12254        pdm --version
12255    else
12256        ohai "Installing PDM..."
12257        curl -sSL https://pdm-project.org/install-pdm.py | python3 -
12258
12259        local bashrc_file="$HOME/.bashrc"
12260        local path_string="export PATH=$HOME/.local/bin:\$PATH"
12261
12262        if ! grep -Fxq "$path_string" $bashrc_file; then
12263            echo "$path_string" >> $bashrc_file
12264            echo "Added $path_string to $bashrc_file"
12265        else
12266            echo "$path_string already present in $bashrc_file"
12267        fi
12268
12269        export PATH=$HOME/.local/bin:$PATH
12270
12271        echo "Checking PDM version..."
12272        pdm --version
12273    fi
12274}
12275
12276# install postgresql
12277install_postgresql() {
12278    if command -v psql &> /dev/null
12279    then
12280        echo "PostgreSQL is already installed."
12281        echo "Checking PostgreSQL version..."
12282        psql --version
12283
12284        # Check if the database exists
12285        DB_EXISTS=$(runuser -l postgres -c "psql -tAc \"SELECT 1 FROM pg_database WHERE datname='compute_subnet_db'\"")
12286        if [ "$DB_EXISTS" == "1" ]; then
12287            echo "Database compute_subnet_db already exists."
12288        else
12289            echo "Creating database compute_subnet_db..."
12290            runuser -l postgres -c "createdb compute_subnet_db"
12291        fi
12292    else
12293        ohai "Installing PostgreSQL..."
12294
12295        apt install -y postgresql postgresql-contrib
12296
12297        echo "Starting PostgreSQL server..."
12298        service postgresql start
12299
12300        read -p "Enter Postgres password: " pg_password
12301
12302        # Set the password for the postgres user
12303        runuser -l postgres -c "psql -c \"ALTER USER postgres PASSWORD '$pg_password';\""
12304
12305        # Create the database as the postgres user
12306        runuser -l postgres -c "createdb compute_subnet_db"
12307    fi
12308}
12309
12310# install miner dependencies
12311install_miner_dependencies() {
12312  ohai "Installing miner..."
12313
12314  # Get the directory of the current script
12315  SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
12316  
12317  # Navigate to the PDM root path relative to the script directory
12318  cd "$SCRIPT_DIR/../neurons/miners" || exit
12319  
12320  # Install PDM dependencies
12321  pdm install
12322}
12323
12324ohai "This script will install:"
12325echo "git"
12326echo "curl"
12327echo "python3.11 and pdm"
12328echo "python3-pip"
12329echo "postgresql"
12330echo "bittensor"
12331echo "install miner dependencies"
12332
12333wait_for_user
12334install_pre
12335install_python
12336install_postgresql
12337install_miner_dependencies
12338
12339
12340---
12341File: /scripts/install_miner_on_ubuntu.sh
12342---
12343
12344#!/bin/bash
12345set -u
12346
12347# enable command completion
12348set -o history -o histexpand
12349
12350abort() {
12351  printf "%s\n" "$1"
12352  exit 1
12353}
12354
12355getc() {
12356  local save_state
12357  save_state=$(/bin/stty -g)
12358  /bin/stty raw -echo
12359  IFS= read -r -n 1 -d '' "$@"
12360  /bin/stty "$save_state"
12361}
12362
12363exit_on_error() {
12364    exit_code=$1
12365    last_command=${@:2}
12366    if [ $exit_code -ne 0 ]; then
12367        >&2 echo "\"${last_command}\" command failed with exit code ${exit_code}."
12368        exit $exit_code
12369    fi
12370}
12371
12372shell_join() {
12373  local arg
12374  printf "%s" "$1"
12375  shift
12376  for arg in "$@"; do
12377    printf " "
12378    printf "%s" "${arg// /\ }"
12379  done
12380}
12381
12382# string formatters
12383if [[ -t 1 ]]; then
12384  tty_escape() { printf "\033[%sm" "$1"; }
12385else
12386  tty_escape() { :; }
12387fi
12388tty_mkbold() { tty_escape "1;$1"; }
12389tty_underline="$(tty_escape "4;39")"
12390tty_blue="$(tty_mkbold 34)"
12391tty_red="$(tty_mkbold 31)"
12392tty_bold="$(tty_mkbold 39)"
12393tty_reset="$(tty_escape 0)"
12394
12395ohai() {
12396  printf "${tty_blue}==>${tty_bold} %s${tty_reset}\n" "$(shell_join "$@")"
12397}
12398
12399wait_for_user() {
12400  local c
12401  echo
12402  echo "Press RETURN to continue or any other key to abort"
12403  getc c
12404  # we test for \r and \n because some stuff does \r instead
12405  if ! [[ "$c" == $'\r' || "$c" == $'\n' ]]; then
12406    exit 1
12407  fi
12408}
12409
12410#install pre
12411install_pre() {
12412    sudo apt update
12413    sudo apt install --no-install-recommends --no-install-suggests -y sudo apt-utils curl git cmake build-essential
12414    exit_on_error $?
12415}
12416
12417# check if python is installed, if not install it
12418install_python() {
12419    # Check if python3.11 is installed
12420    if command -v python3.11 &> /dev/null
12421    then
12422        # Check the version
12423        PYTHON_VERSION=$(python3.11 --version 2>&1)
12424        if [[ $PYTHON_VERSION == *"Python 3.11"* ]]; then
12425            ohai "Python 3.11 is already installed."
12426        else
12427            ohai "Linking python to python 3.11"
12428            sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
12429            python -m pip install cffi
12430            python -m pip install cryptography
12431        fi
12432    else
12433        ohai "Installing Python 3.11"
12434        add-apt-repository ppa:deadsnakes/ppa
12435        sudo apt install python3.11
12436        sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
12437        python -m pip install cffi
12438        python -m pip install cryptography
12439    fi
12440
12441    # check if PDM is installed
12442    if command -v pdm &> /dev/null
12443    then
12444        ohai "PDM is already installed."
12445        echo "Checking PDM version..."
12446        pdm --version
12447    else
12448        ohai "Installing PDM..."
12449        sudo apt install -y python3.12-venv
12450        curl -sSL https://pdm-project.org/install-pdm.py | python3 -
12451
12452        local bashrc_file="/root/.bashrc"
12453        local path_string="export PATH=/root/.local/bin:\$PATH"
12454
12455        if ! grep -Fxq "$path_string" $bashrc_file; then
12456            echo "$path_string" >> $bashrc_file
12457            echo "Added $path_string to $bashrc_file"
12458        else
12459            echo "$path_string already present in $bashrc_file"
12460        fi
12461
12462        export PATH=/root/.local/bin:$PATH
12463
12464        echo "Checking PDM version..."
12465        pdm --version
12466    fi
12467}
12468
12469# install redis
12470install_redis() {
12471    if command -v redis-server &> /dev/null
12472    then
12473        ohai "Redis is already installed."
12474        echo "Checking Redis version..."
12475        redis-server --version
12476    else
12477        ohai "Installing Redis..."
12478
12479        sudo apt install -y redis-server
12480
12481        echo "Starting Redis server..."
12482        sudo systemctl start redis-server.service
12483
12484        echo "Checking Redis server status..."
12485        sudo systemctl status redis-server.service
12486    fi
12487}
12488
12489# install postgresql
12490install_postgresql() {
12491    if command -v psql &> /dev/null
12492    then
12493        ohai "PostgreSQL is already installed."
12494        echo "Checking PostgreSQL version..."
12495        psql --version
12496
12497        # Check if the database exists
12498        DB_EXISTS=$(sudo -u postgres psql -tAc "SELECT 1 FROM pg_database WHERE datname='compute_subnet_db'")
12499        if [ "$DB_EXISTS" == "1" ]; then
12500            echo "Database compute_subnet_db already exists."
12501        else
12502            echo "Creating database compute_subnet_db..."
12503            sudo -u postgres createdb compute_subnet_db
12504        fi
12505    else
12506        echo "Installing PostgreSQL..."
12507        sudo apt install -y postgresql postgresql-contrib
12508
12509        echo "Starting PostgreSQL server..."
12510        sudo systemctl start postgresql.service
12511
12512        echo "Setting password for postgres user..."
12513        sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'password';"
12514
12515        echo "Creating database compute_subnet_db..."
12516        sudo -u postgres createdb compute_subnet_db
12517    fi
12518}
12519
12520# install btcli
12521install_btcli() {
12522    if command -v btcli &> /dev/null
12523    then
12524        ohai "BtCLI is already installed."
12525    else
12526        ohai "Installing BtCLI..."
12527
12528        sudo apt install -y pipx 
12529        pipx install bittensor
12530        source ~/.bashrc
12531    fi
12532}
12533
12534# install docker
12535install_docker() {
12536  if command -v docker &> /dev/null; then
12537    ohai "Docker is already installed."
12538    return 0
12539  else
12540    ohai "Installing Docker..."
12541    sudo apt-get update -y
12542    sudo apt-get install -y ca-certificates curl
12543    sudo install -m 0755 -d /etc/apt/keyrings
12544    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
12545    sudo chmod a+r /etc/apt/keyrings/docker.asc
12546    
12547    # Add the repository to Apt sources:
12548    echo \
12549      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
12550      $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
12551      sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
12552    sudo apt-get update -y
12553    sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
12554    sudo groupadd docker
12555    sudo usermod -aG docker $USER
12556    newgrp docker
12557  fi
12558}
12559
12560ohai "This script will install:"
12561echo "bittensor"
12562echo "docker"
12563
12564
12565wait_for_user
12566install_pre
12567install_btcli
12568install_docker
12569
12570
12571
12572---
12573File: /scripts/install_staging.sh
12574---
12575
12576#!/bin/bash
12577
12578# Section 1: Build/Install
12579# This section is for first-time setup and installations.
12580
12581install_dependencies() {
12582    # Function to install packages on macOS
12583    install_mac() {
12584        which brew > /dev/null
12585        if [ $? -ne 0 ]; then
12586            echo "Installing Homebrew..."
12587            /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
12588        fi
12589        echo "Updating Homebrew packages..."
12590        brew update
12591        echo "Installing required packages..."
12592        brew install make llvm curl libssl protobuf tmux
12593    }
12594
12595    # Function to install packages on Ubuntu/Debian
12596    install_ubuntu() {
12597        echo "Updating system packages..."
12598        sudo apt update
12599        echo "Installing required packages..."
12600        sudo apt install --assume-yes make build-essential git clang curl libssl-dev llvm libudev-dev protobuf-compiler tmux
12601    }
12602
12603    # Detect OS and call the appropriate function
12604    if [[ "$OSTYPE" == "darwin"* ]]; then
12605        install_mac
12606    elif [[ "$OSTYPE" == "linux-gnu"* ]]; then
12607        install_ubuntu
12608    else
12609        echo "Unsupported operating system."
12610        exit 1
12611    fi
12612
12613    # Install rust and cargo
12614    curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
12615
12616    # Update your shell's source to include Cargo's path
12617    source "$HOME/.cargo/env"
12618}
12619
12620# Call install_dependencies only if it's the first time running the script
12621if [ ! -f ".dependencies_installed" ]; then
12622    install_dependencies
12623    touch .dependencies_installed
12624fi
12625
12626
12627# Section 2: Test/Run
12628# This section is for running and testing the setup.
12629
12630# Create a coldkey for the owner role
12631wallet=${1:-owner}
12632
12633# Logic for setting up and running the environment
12634setup_environment() {
12635    # Clone subtensor and enter the directory
12636    if [ ! -d "subtensor" ]; then
12637        git clone https://github.com/opentensor/subtensor.git
12638    fi
12639    cd subtensor
12640    git pull
12641
12642    # Update to the nightly version of rust
12643    ./scripts/init.sh
12644
12645    cd ../bittensor-subnet-template
12646
12647    # Install the bittensor-subnet-template python package
12648    python -m pip install -e .
12649
12650    # Create and set up wallets
12651    # This section can be skipped if wallets are already set up
12652    if [ ! -f ".wallets_setup" ]; then
12653        btcli wallet new_coldkey --wallet.name $wallet --no_password --no_prompt
12654        btcli wallet new_coldkey --wallet.name miner --no_password --no_prompt
12655        btcli wallet new_hotkey --wallet.name miner --wallet.hotkey default --no_prompt
12656        btcli wallet new_coldkey --wallet.name validator --no_password --no_prompt
12657        btcli wallet new_hotkey --wallet.name validator --wallet.hotkey default --no_prompt
12658        touch .wallets_setup
12659    fi
12660
12661}
12662
12663# Call setup_environment every time
12664setup_environment 
12665
12666## Setup localnet
12667# assumes we are in the bittensor-subnet-template/ directory
12668# Initialize your local subtensor chain in development mode. This command will set up and run a local subtensor network.
12669cd ../subtensor
12670
12671# Start a new tmux session and create a new pane, but do not switch to it
12672echo "FEATURES='pow-faucet runtime-benchmarks' BT_DEFAULT_TOKEN_WALLET=$(cat ~/.bittensor/wallets/$wallet/coldkeypub.txt | grep -oP '"ss58Address": "\K[^"]+') bash scripts/localnet.sh" >> setup_and_run.sh
12673chmod +x setup_and_run.sh
12674tmux new-session -d -s localnet -n 'localnet'
12675tmux send-keys -t localnet 'bash ../subtensor/setup_and_run.sh' C-m
12676
12677# Notify the user
12678echo ">> localnet.sh is running in a detached tmux session named 'localnet'"
12679echo ">> You can attach to this session with: tmux attach-session -t localnet"
12680
12681# Register a subnet (this needs to be run each time we start a new local chain)
12682btcli subnet create --wallet.name $wallet --wallet.hotkey default --subtensor.chain_endpoint ws://127.0.0.1:9946 --no_prompt
12683
12684# Transfer tokens to miner and validator coldkeys
12685export BT_MINER_TOKEN_WALLET=$(cat ~/.bittensor/wallets/miner/coldkeypub.txt | grep -oP '"ss58Address": "\K[^"]+')
12686export BT_VALIDATOR_TOKEN_WALLET=$(cat ~/.bittensor/wallets/validator/coldkeypub.txt | grep -oP '"ss58Address": "\K[^"]+')
12687
12688btcli wallet transfer --subtensor.network ws://127.0.0.1:9946 --wallet.name $wallet --dest $BT_MINER_TOKEN_WALLET --amount 1000 --no_prompt
12689btcli wallet transfer --subtensor.network ws://127.0.0.1:9946 --wallet.name $wallet --dest $BT_VALIDATOR_TOKEN_WALLET --amount 10000 --no_prompt
12690
12691# Register wallet hotkeys to subnet
12692btcli subnet register --wallet.name miner --netuid 1 --wallet.hotkey default --subtensor.chain_endpoint ws://127.0.0.1:9946 --no_prompt
12693btcli subnet register --wallet.name validator --netuid 1 --wallet.hotkey default --subtensor.chain_endpoint ws://127.0.0.1:9946 --no_prompt
12694
12695# Add stake to the validator
12696btcli stake add --wallet.name validator --wallet.hotkey default --subtensor.chain_endpoint ws://127.0.0.1:9946 --amount 10000 --no_prompt
12697
12698# Ensure both the miner and validator keys are successfully registered.
12699btcli subnet list --subtensor.chain_endpoint ws://127.0.0.1:9946
12700btcli wallet overview --wallet.name validator --subtensor.chain_endpoint ws://127.0.0.1:9946 --no_prompt
12701btcli wallet overview --wallet.name miner --subtensor.chain_endpoint ws://127.0.0.1:9946 --no_prompt
12702
12703cd ../bittensor-subnet-template
12704
12705
12706# Check if inside a tmux session
12707if [ -z "$TMUX" ]; then
12708    # Start a new tmux session and run the miner in the first pane
12709    tmux new-session -d -s bittensor -n 'miner' 'python neurons/miner.py --netuid 1 --subtensor.chain_endpoint ws://127.0.0.1:9946 --wallet.name miner --wallet.hotkey default --logging.debug'
12710    
12711    # Split the window and run the validator in the new pane
12712    tmux split-window -h -t bittensor:miner 'python neurons/validator.py --netuid 1 --subtensor.chain_endpoint ws://127.0.0.1:9946 --wallet.name validator --wallet.hotkey default --logging.debug'
12713    
12714    # Attach to the new tmux session
12715    tmux attach-session -t bittensor
12716else
12717    # If already in a tmux session, create two panes in the current window
12718    tmux split-window -h 'python neurons/miner.py --netuid 1 --subtensor.chain_endpoint ws://127.0.0.1:9946 --wallet.name miner --wallet.hotkey default --logging.debug'
12719    tmux split-window -v -t 0 'python neurons/validator.py --netuid 1 --subtensor.chain_endpoint ws://127.0.0.1:9946 --wallet.name validator --wallet.hotkey default --logging.debug'
12720fi
12721
12722
12723
12724---
12725File: /scripts/install_validator_on_ubuntu.sh
12726---
12727
12728#!/bin/bash
12729set -u
12730
12731# enable command completion
12732set -o history -o histexpand
12733
12734abort() {
12735  printf "%s\n" "$1"
12736  exit 1
12737}
12738
12739getc() {
12740  local save_state
12741  save_state=$(/bin/stty -g)
12742  /bin/stty raw -echo
12743  IFS= read -r -n 1 -d '' "$@"
12744  /bin/stty "$save_state"
12745}
12746
12747exit_on_error() {
12748    exit_code=$1
12749    last_command=${@:2}
12750    if [ $exit_code -ne 0 ]; then
12751        >&2 echo "\"${last_command}\" command failed with exit code ${exit_code}."
12752        exit $exit_code
12753    fi
12754}
12755
12756shell_join() {
12757  local arg
12758  printf "%s" "$1"
12759  shift
12760  for arg in "$@"; do
12761    printf " "
12762    printf "%s" "${arg// /\ }"
12763  done
12764}
12765
12766# string formatters
12767if [[ -t 1 ]]; then
12768  tty_escape() { printf "\033[%sm" "$1"; }
12769else
12770  tty_escape() { :; }
12771fi
12772tty_mkbold() { tty_escape "1;$1"; }
12773tty_underline="$(tty_escape "4;39")"
12774tty_blue="$(tty_mkbold 34)"
12775tty_red="$(tty_mkbold 31)"
12776tty_bold="$(tty_mkbold 39)"
12777tty_reset="$(tty_escape 0)"
12778
12779ohai() {
12780  printf "${tty_blue}==>${tty_bold} %s${tty_reset}\n" "$(shell_join "$@")"
12781}
12782
12783wait_for_user() {
12784  local c
12785  echo
12786  echo "Press RETURN to continue or any other key to abort"
12787  getc c
12788  # we test for \r and \n because some stuff does \r instead
12789  if ! [[ "$c" == $'\r' || "$c" == $'\n' ]]; then
12790    exit 1
12791  fi
12792}
12793
12794#install pre
12795install_pre() {
12796    sudo apt update
12797    sudo apt install --no-install-recommends --no-install-suggests -y sudo apt-utils curl git cmake build-essential
12798    exit_on_error $?
12799}
12800
12801# check if python is installed, if not install it
12802install_python() {
12803    # Check if python3.11 is installed
12804    if command -v python3.11 &> /dev/null
12805    then
12806        # Check the version
12807        PYTHON_VERSION=$(python3.11 --version 2>&1)
12808        if [[ $PYTHON_VERSION == *"Python 3.11"* ]]; then
12809            ohai "Python 3.11 is already installed."
12810        else
12811            ohai "Linking python to python 3.11"
12812            sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
12813            python -m pip install cffi
12814            python -m pip install cryptography
12815        fi
12816    else
12817        ohai "Installing Python 3.11"
12818        add-apt-repository ppa:deadsnakes/ppa
12819        sudo apt install python3.11
12820        sudo update-alternatives --install /usr/bin/python python /usr/bin/python3.11 1
12821        python -m pip install cffi
12822        python -m pip install cryptography
12823    fi
12824
12825    # check if PDM is installed
12826    if command -v pdm &> /dev/null
12827    then
12828        ohai "PDM is already installed."
12829        echo "Checking PDM version..."
12830        pdm --version
12831    else
12832        ohai "Installing PDM..."
12833        sudo apt install -y python3.12-venv
12834        curl -sSL https://pdm-project.org/install-pdm.py | python3 -
12835
12836        local bashrc_file="/root/.bashrc"
12837        local path_string="export PATH=/root/.local/bin:\$PATH"
12838
12839        if ! grep -Fxq "$path_string" $bashrc_file; then
12840            echo "$path_string" >> $bashrc_file
12841            echo "Added $path_string to $bashrc_file"
12842        else
12843            echo "$path_string already present in $bashrc_file"
12844        fi
12845
12846        export PATH=/root/.local/bin:$PATH
12847
12848        echo "Checking PDM version..."
12849        pdm --version
12850    fi
12851}
12852
12853# install redis
12854install_redis() {
12855    if command -v redis-server &> /dev/null
12856    then
12857        ohai "Redis is already installed."
12858        echo "Checking Redis version..."
12859        redis-server --version
12860    else
12861        ohai "Installing Redis..."
12862
12863        sudo apt install -y redis-server
12864
12865        echo "Starting Redis server..."
12866        sudo systemctl start redis-server.service
12867
12868        echo "Checking Redis server status..."
12869        sudo systemctl status redis-server.service
12870    fi
12871}
12872
12873# install postgresql
12874install_postgresql() {
12875    if command -v psql &> /dev/null
12876    then
12877        ohai "PostgreSQL is already installed."
12878        echo "Checking PostgreSQL version..."
12879        psql --version
12880
12881        # Check if the database exists
12882        DB_EXISTS=$(sudo -u postgres psql -tAc "SELECT 1 FROM pg_database WHERE datname='compute_subnet_db'")
12883        if [ "$DB_EXISTS" == "1" ]; then
12884            echo "Database compute_subnet_db already exists."
12885        else
12886            echo "Creating database compute_subnet_db..."
12887            sudo -u postgres createdb compute_subnet_db
12888        fi
12889    else
12890        echo "Installing PostgreSQL..."
12891        sudo apt install -y postgresql postgresql-contrib
12892
12893        echo "Starting PostgreSQL server..."
12894        sudo systemctl start postgresql.service
12895
12896        echo "Setting password for postgres user..."
12897        sudo -u postgres psql -c "ALTER USER postgres PASSWORD 'password';"
12898
12899        echo "Creating database compute_subnet_db..."
12900        sudo -u postgres createdb compute_subnet_db
12901    fi
12902}
12903
12904# install btcli
12905install_btcli() {
12906    if command -v btcli &> /dev/null
12907    then
12908        ohai "BtCLI is already installed."
12909    else
12910        ohai "Installing BtCLI..."
12911
12912        sudo apt install -y pipx 
12913        pipx install bittensor
12914        source ~/.bashrc
12915    fi
12916}
12917
12918# install docker
12919install_docker() {
12920  if command -v docker &> /dev/null; then
12921    ohai "Docker is already installed."
12922    return 0
12923  else
12924    ohai "Installing Docker..."
12925    sudo apt-get update -y
12926    sudo apt-get install -y ca-certificates curl
12927    sudo install -m 0755 -d /etc/apt/keyrings
12928    sudo curl -fsSL https://download.docker.com/linux/ubuntu/gpg -o /etc/apt/keyrings/docker.asc
12929    sudo chmod a+r /etc/apt/keyrings/docker.asc
12930    
12931    # Add the repository to Apt sources:
12932    echo \
12933      "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu \
12934      $(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
12935      sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
12936    sudo apt-get update -y
12937    sudo apt-get install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
12938    sudo groupadd docker
12939    sudo usermod -aG docker $USER
12940    newgrp docker
12941  fi
12942}
12943
12944ohai "This script will install:"
12945echo "bittensor"
12946echo "docker"
12947
12948
12949wait_for_user
12950install_pre
12951install_btcli
12952install_docker
12953
12954
12955
12956---
12957File: /tests/__init__.py
12958---
12959
12960
12961
12962
12963---
12964File: /README.md
12965---
12966
12967# Datura Compute Subnet
12968
12969# Compute Subnet on Bittensor
12970
12971Welcome to the **Compute Subnet on Bittensor**! This project enables a decentralized, peer-to-peer GPU rental marketplace, connecting miners who contribute GPU resources with users who need computational power. Our frontend interface is available at [celiumcompute.ai](https://celiumcompute.ai), where you can easily rent machines from the subnet.
12972
12973## Table of Contents
12974
12975- [Introduction](#introduction)
12976- [High-Level Architecture](#high-level-architecture)
12977- [Getting Started](#getting-started)
12978  - [For Renters](#for-renters)
12979  - [For Miners](#for-miners)
12980  - [For Validators](#for-validators)
12981- [Contact and Support](#contact-and-support)
12982
12983## Introduction
12984
12985The Compute Subnet on Bittensor is a decentralized network that allows miners to contribute their GPU resources to a global pool. Users can rent these resources for computational tasks, such as machine learning, data analysis, and more. The system ensures fair compensation for miners based on the quality and performance of their GPUs.
12986
12987
12988## High-Level Architecture
12989
12990- **Miners**: Provide GPU resources to the network, evaluated and scored by validators.
12991- **Validators**: Securely connect to miner machines to verify hardware specs and performance. They maintain the network's integrity.
12992- **Renters**: Rent computational resources from the network to run their tasks.
12993- **Frontend (celiumcompute.ai)**: The web interface facilitating easy interaction between miners and renters.
12994- **Bittensor Network**: The decentralized blockchain in which the compensation is managed and paid out by the validators to the miners through its native token, $TAO.
12995
12996## Getting Started
12997
12998### For Renters
12999
13000If you are looking to rent computational resources, you can easily do so through the Compute Subnet. Renters can:
13001
130021. Visit [celiumcompute.ai](https://celiumcompute.ai) and sign up.
130032. **Browse** available GPU resources.
130043. **Select** machines based on GPU type, performance, and price.
130054. **Deploy** and monitor your computational tasks using the platform's tools.
13006
13007To start renting machines, visit [celiumcompute.ai](https://celiumcompute.ai) and access the resources you need.
13008
13009### For Miners
13010
13011Miners can contribute their GPU-equipped machines to the network. The machines are scored and validated based on factors like GPU type, number of GPUs, bandwidth, and overall GPU performance. Higher performance results in better compensation for miners.
13012
13013If you are a miner and want to contribute GPU resources to the subnet, please refer to the [Miner Setup Guide](neurons/miners/README.md) for instructions on how to:
13014
13015- Set up your environment.
13016- Install the miner software.
13017- Register your miner and connect to the network.
13018- Get compensated for providing GPUs!
13019
13020### For Validators
13021
13022Validators play a crucial role in maintaining the integrity of the Compute Subnet by verifying the hardware specifications and performance of miners’ machines. Validators ensure that miners are fairly compensated based on their GPU contributions and prevent fraudulent activities.
13023
13024For more details, visit the [Validator Setup Guide](neurons/validators/README.md).
13025
13026
13027## Contact and Support
13028
13029If you need assistance or have any questions, feel free to reach out:
13030
13031- **Discord Support**: [Dedicated Channel within the Bittensor Discord](https://discord.com/channels/799672011265015819/1291754566957928469)
13032