Removing old Git repositories from GitLab
by A. Wilcox
Hello all,
Now that we've migrated to the new GitLab, I think it is time we revisit
the repositories we are still keeping (in archived form):
* aports.git and aports-old.git
Both are over 100 MB and both contain nothing helpful or useful to us
any more. They're snapshots of 2017 and 2018 Alpine packages, sometimes
with improvements that made it to system/ or user/. If we want to
investigate how Alpine did something we can just look in their current
Git instead of an old mirror. I'd like to remove both of these.
* portageplus.git
This is some patches to an old version of Portage, ca early 2017, that
supports emitting APK binpkgs. It also defaults to Galapagos instead of
Gentoo as a fallback repository. (How many here even remember Galapagos?)
We can probably archive the apkkit patch for posterity and then delete
it, IMO.
* patches.git
musl compatibility patches for some packages in /etc/portage/patches format.
My suggestion is take what we haven't packaged in packages.git and save
it somewhere (or maybe just package it?), then remove it.
* etc-portage.git, apkkit-conf.git, systemsite.git
Old infrastructure from when we were a Gentoo fork. Maybe archive for
posterity, or delete it.
Comments welcome.
--arw
--
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org
1 year, 9 months
Proposal: Replacing Integricloud with Scaleway
by A. Wilcox
== Table of Contents ==
* Executive Summary
* How Did We Get Here?
* Reliability Is Not Availability
* Enter Scaleway
* Hard Numbers
* Conclusion
* References
== Executive Summary ==
This is a formal proposal to retire the dedicated server we have with
Integricloud and replace it with a set of virtual servers from Scaleway.
We originally chose Integricloud's dedicated server offering primarily
for reliability and security. While it has proven secure, and the
hardware itself is reliable, its availability leaves something to be
desired.
Scaleway offers a similar level of reliability, and has a higher level
of availability based on our current account with them. They
additionally offer servers that are not based on the x86 architecture,
so we are still protected from the numerous issues that plague x86.
This will also reduce our hosting costs by almost 90%, and should reduce
downtime by nearly 100%.
== How Did We Get Here? ==
In early January 2019, we were notified that both of our dedicated
servers at Rack911 were being retired, with very little notice. For
some additional information, reference adelie-devel@ post with message
ID <ba35ebd3-54b4-f18f-b65f-d327e9d0af80(a)adelielinux.org>
(archived at [1]).
After our sponsorship was pulled in October 2018, we had done a bit of
investigation into replacement hosting providers in the event that this
would happen. Our requirements at the time were:
* non-x86 based (due to the plethora of x86 bugs being discovered)
* at least 8 GB RAM minimum
* dedicated hardware preferred
* at least 3 IPv4 addresses
We evaluated Packet.net for ARM64 based systems[2] and Integricloud for
PPC64 based systems[3]. We found Integricloud to be approximately 60%
of the cost of Packet.net[4]. Additionally, we had a professional
working relationship with their parent company, Raptor Engineering, who
make the Talos and Blackbird family of computers. In fact, the
Integricloud system we were offered was to be a rack-mounted Talos II.
Since we already had a Talos II in use as a build server, we felt this
would be close to ideal, as any hardware oddities have already been
worked out.
We chose their 4-core (16-thread) PowerPC system with 8 GB RAM and 2 x 1
TB NVMe disk storage. One 1 TB NVMe disk is dedicated to
mirrormaster.adelielinux.org. The other 1 TB NVMe disk is an LVM group,
shared between the various KVM-based virtual servers run on it.
== Reliability Is Not Availability ==
The Integricloud dedicated server, chloe.adelielinux.org, has has no
hardware issues in over eight months of service. The hardware itself
has been fast, stable, and very reliable. However, there have been
multiple issues regarding availability.
Integricloud has a single homed fibre infrastructure; per a public
looking glass, it is run via Mediacom[5]. This has caused an unforeseen
and consistent issue regarding availability.
2019-04-16 13:17 down
2019-04-16 22:24 9 hours, 7 minutes
2019-04-17 00:10 down
2019-04-17 12:29 12 hours, 19 minutes
2019-07-09 06:25 down
2019-07-09 20:01 13 hours, 37 minutes
2019-07-10 15:14 down
2019-07-10 15:39 25 minutes
2019-07-12 16:35 down
2019-07-12 16:43 8 minutes
This has resulted in a 97% uptime for April, and a 98% uptime for July -
and we are only 13 days into July, so this number could go down further.
Additionally, many ISPs are not accepting Mediacom's IPv6 route
announcements. This has caused mirrormaster to be inaccessible to many
of our users, and even one of the members of our own Infra Team[6].
Finally, while yours truly was trying to show an Adélie Web page to
someone while on public Wi-Fi at a well-known place in Broken Arrow, OK,
I was greeted with an error page[7]:
Sonicwall Network Security Appliance
This site has been blocked by the network administrator.
Block reason: Gateway GEO-IP Filter Alert
IP address: 23.155.224.64
Connection initiated towards country: Unknown
If a car dealership's firewall is blocking us, who knows what other
firewalls are blocking us. How many people are unable discover us, and
how many corporate sponsors are we missing out on, because they can't
even connect to our Web site? And why can they not connect to our Web
site? It could be the IPv6 peering issue, or a firewall blocking our
IPv4 space, or because Mediacom has suffered another "fibre cut".
== Enter Scaleway ==
We have had a working relationship with Scaleway for almost a year and a
half. We launched our 32-bit ARM builder on the Scaleway ARM cloud in
March 2018, and have had no downtime in that time:
awilcox on erin [pts/0 Sat 13 9:33] ~: uptime
09:33:02 up 489 days, 5:59, load average: 0.00, 0.00, 0.00
The network has never suffered any outages, either. Since the Scaleway
cloud features ARM servers, we would additionally still be able to avoid
the x86 architecture and all of its failings.
We have continually been limited by our lack of IPv4 space at
Integricloud. Currently, we "proxy" every server via athdheise, a
virtual server on our Integricloud dedicated system that has both an
IPv4 and IPv6 address. All of our main systems are IPv6-only (wiki,
bts, next, etc), and when an IPv4 system attempts to connect to any of
these services, they have to be proxied via athdheise.
If we use Scaleway virtual servers, every system gets its own dedicated
IPv4 address, which drastically simplifies our administration.
Additionally, we would receive a lot more RAM per virtual server.
Currently, athdheise - the aforementioned Web server and proxy - has 256
MB RAM. It has 34 MB of available RAM. When documentation changes are
made and the Git hook runs to cause athdheise to rebuild the
documentation site (at help.adelielinux.org), sometimes the process runs
out of memory. This means one of us has to log in, stop the web server,
run the make process, and then restart the web server. The minimum RAM
at Scaleway is 2 GB per virtual server. This is an extreme amount of
overhead, and would even allow us to play with memcached (or other
caching solutions) to reduce latency across our infrastructure.
Finally, we would save a dramatic amount of money. We currently pay
225$/mo pre-tax for Integricloud.
== Hard Numbers ==
The current systems we run on Integricloud are:
enfys (postgresql) 768 MB RAM 30 GB disk
rarity (these mailing lists) 1536 MB RAM 30 GB disk
mirrormaster 256 MB RAM 1 TB disk
bts (Bugzilla issue tracking) 512 MB RAM 8 GB disk
athdheise (Web server/proxy) 256 MB RAM 4 GB disk
wiki 512 MB RAM 8 GB disk
annwyn (Nextcloud) 512 MB RAM 100 GB disk
chatterbox (Quassel IRC) 512 MB RAM 40 GB disk
Since Scaleway tops out at 500 GB disk, we will need to consider
alternate hosting for mirrormaster. I believe we can run this on the
Hetzner dedicated server that is being sponsored by Alyx at Leuhta Labs.
And this is what we could pay per virtual system on Scaleway:
4 ARM CPUs, 2 GB RAM, 50 GB disk - 2.99€/mo
6 ARM CPUs, 4 GB RAM, 100 GB disk - 5.99€/mo
8 ARM CPUs, 8 GB RAM, 200 GB disk - 11.99€/mo
By my approximation, we would be able to put every single system except
annwyn on the smallest server, and annwyn on the second-smallest.
6× 2.99€ = 17.94€ per month
1× 4.99€ + 17.94€ = 22.93€ per month total cost, or approximately
25.81$. This is a savings of nearly 90% after tax.
== Conclusion ==
I believe that retiring our Integricloud dedicated server and replacing
it with Scaleway virtual ARM servers makes business sense. It will
allow us to spend less time down, dramatically improve the architecture
of our infrastructure, and reach more people. This will allow us to
have an even greater reach, and allow us to grow into a larger, more
healthy Linux distribution that can genuinely improve the world.
I do not want to leave this proposal without a separate smaller proposal
for how this could be effected easily. I believe that we can simply
start by migrating the wiki server, since it is the least used service.
We can feel out Scaleway's ARM offering for a while, and make sure that
it will genuinely work for our needs. After we are satisfied, we can
change the DNS for the wiki and begin work on another server. Assuming
all goes well, we will eventually be able to quietly power off the
Integricloud dedicated system with zero further downtime.
Thank you so much for reading this proposal. I welcome any comments or
questions you may have. You may respond here or poke me on IRC. I'll
post a summary email in response with any important notes from IRC.
Best,
--arw
== References ==
[1]:
https://lists.adelielinux.org/hyperkitty/list/adelie-devel@lists.adelieli...
[2]: https://www.packet.com/cloud/servers/c1-large-arm/
[3]: https://www.integricloud.com/
[4]: The Packet.net ARM box runs at 360$/mo. Integricloud is 220$/mo.
[5]: https://bgp.he.net/AS46246
[6]:
<aranea> awilfox: Looks like my routing issues are Mediacom's (that's
Raptor's only upstream) fault. I doubt I'll have any success contacting
them; this needs to come from a customer. I'll try contacting tpearson
again with more details; if he doesn't respond, I may have to ask you to
file an outage report or sth.
<aranea> Short version: Mediacom doesn't follow some standard industry
practices, and thus many of their peers aren't accepting the routes they
announce on behalf of their customers (and guess what, Raport is their
only IPv6 customer.)
[7]: https://i.imgur.com/khmebJ5.png
--
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org
1 year, 10 months
Re: Proposal: Replacing Integricloud with Scaleway
by Kiyoshi Aman
I think that, before we can make a good decision regarding migration, we
should look at the resources we already have available to us. We have a
dedicated server in Finland with 32GB RAM and a fair amount of disk space
available. We have two dedicated servers generously provided for our use in
Pennsylvania, albeit one earmarked for buildbot service and the other as
yet provisioned (by us) for service.
In light of that, I want to take the server list below in reverse order.
A. Wilcox wrote:
> The current systems we run on Integricloud are:
>
> enfys (postgresql) 768 MB RAM 30 GB disk
>
> rarity (these mailing lists) 1536 MB RAM 30 GB disk
>
> mirrormaster 256 MB RAM 1 TB disk
>
> bts (Bugzilla issue tracking) 512 MB RAM 8 GB disk
>
> athdheise (Web server/proxy) 256 MB RAM 4 GB disk
>
> wiki 512 MB RAM 8 GB disk
>
> annwyn (Nextcloud) 512 MB RAM 100 GB disk
>
> chatterbox (Quassel IRC) 512 MB RAM 40 GB disk
At the moment, both chatterbox and annwyn are personal resources. Leaving
aside the discussion as to whether they belong on project infrastructure,
they should be migrated to personal infrastructure unless they are intended
to be made more widely available to Adélie contributors (even if only to
the core and/or infrastructure teams).
athdheise only exists because IntegriCloud was not able to provide IPv4
addresses at a price we were able to pay. It would be retired regardless.
My understanding is that we were planning on retiring the wiki. This would
be an excellent time to do so.
I think that bts should be retired and merged with gitlab, or alternatively
it can be on the same server if retiring it is contra-indicated (e.g. due
to gitlab being unable to provide bug-tracking without a git repo
associated).
mirrormaster would need to be migrated to the Finland server regardless,
since no VPS provider provides block storage in the quantities we need at a
rate we can live with.
The mailing lists should be on hosting separate from our other
infrastructure, since it can and should be usable regardless of the rest of
our infrastructure's dispositions.
That leaves the postgresql server, which should be co-located with gitlab.
I understand that one of our goals is for our infrastructure to not be
subject to architectural issues with x86_64. In principle I agree, but
migrating the majority of our infrastructure from VPSes on a single
dedicated server to VPSes on an unknown number of servers, especially in
today's security environment, carries more risk than ensuring our
infrastructure sits on hardware we know is used only by us.
1 year, 10 months
Splitting debug packages into separate repositories
by A. Wilcox
Hi all,
I would like to formally suggest that we take our -dbg packages and put
them in a split repository, such as system-debug and user-debug.
Non-stratum 1 mirrors could choose whether or not to mirror these as
well. This would take a significant amount of disk space pressure off
of stratum 2/3 mirrors.
Output for:
du -b *-dbg-*.apk | awk 'BEGIN { s = 0; } { s += $1; } END { print s; }'
system/ppc: 1853417479
system/ppc64: 1877177133
user/ppc: 1.73955e+10
user/ppc64: 1.9961e+10
So, the overall size reduction would be about 20 GB per architecture.
Consider that for ppc64, the entirety of system/ is 3.8G; user/, 28G.
Please respond with any thoughts or comments.
Best,
--arw
--
A. Wilcox (awilfox)
Project Lead, Adélie Linux
https://www.adelielinux.org
1 year, 10 months