Integrating installation of Internet Archive as a server

I’m trying to install a Internet Archive server so that it can potentially become one of the options available on Rachel.
This is not just a new content module, so the places to do this are obviously widespread throughout the code, and I haven’t so far found them collated anywhere. I have a hand-installed version running and can repeat that process based on the instructions at https://github.com/internetarchive/dweb-mirror/blob/master/README-rachel.md
For example …

  • Appearing in the control panel for any administration
  • Appearing as an option for the user
  • Fetching the files from github or npm and setting up the distribution.
  • Integrating a more recent version of node into the base distribution.
    I can probably make the appropriate changes, but I’m not going to be able to do this without some help in finding where the changes need to be made, and any other considerations to make it play nicely.
    This is, of course, a work in progress, and will probably change a fair amount once its used in the field for real deployments, what I’m wondering about is whether there is interest in this being part of the Rachel distribution and if so whether someone can help with this process.
3 Likes

Hi @mitra – this is awesome. We’re in a bit of pinch right now in terms of resources to help (we’re a small team dealing with financial year end), but I want to do this for sure. Can it just be on me to get back to you about a month from now?

1 Like

yes - we can do that, I’m going to mention it at a talk I’m giving at Internet Freedom Festival in a week, but I’m also cranking on a number of bugs between now and then.

2 Likes

Hi @jeremy we are now part of the installation process on Internet In A Box, so would be good to get it installed as part of the normal process on Rachel as well. Are you available to help with this ?

1 Like

Hi @mitra – we’re still just slammed supporting existing deployments, largely with integrating and upgrading Kolibri and other softwares at the moment. We do want to do this, but we are generally very careful around this type of thing and will need to have more capacity to work on it than we do at the moment.

I hope you don’t take this negatively, I wish we had more resources at the moment to take on new things. We will get there, our work always comes in waves, we just don’t have the capacity quite yet.

1 Like

Understood - our server can run on Rachel, but the integration steps require assistance by people who know the system, as they did with Internet In A Box (IIAB).

We now have a full integration into the default Internet In A Box so we will send the enquiries in that direction until someone has a couple of hours to help complete this.

1 Like

I can help you on the Raspberry Pi side RACHEL if you want. Where are you looking to integrate your server?

1 Like

Hi Jamesk - Thanks for offering, I’m not sure about the “where” part of your question.

We haven’t tried running on the Rachel-on-Pi combo, only on the Rachel box, which in particular has a problem with an ancient version of node (there is now a 32-bit version of node, but it would require the team to upgrade to it).

On the RPI I have to admit I don’t even know the process to put Rachel on one. We have a full install for the Internet Archive offline server at https://github.com/internetarchive/dweb-mirror and it has instructions for a Rachel3+ box; generic RPI; RPI with IIAB and Mac. In each of these the install process for dweb-mirror itself is reasonably smooth but the hard parts are

  • getting the prerequisites installed in a way that is compatible with the platform, (especially for example putting a reasonably recent version of Node on Rachel)
  • integrating with the auto-start processes so it starts at boot time
  • integrating into the UX so that it appears as a button or link alongside things like Wikipedia.

So far -doing this in a way that works well with the platform for each of these aspects have been highly dependent on the platform, e.g. “service” vs “supervisorctl” vs /etc/rc.d for startup, so it needs help from someone who understands the platform.

In the long run of course its great if/when it becomes one of the optional installs in the standard package, but in the meantime at least getting it to the point where its a simple installation (combination of instructions and scripts) that someone of slightly less than uber-geek technical capability can follow is where we need the help for Rachel.

1 Like

I’ll take a look at integrating this into the RACHEL-Pi over the next few days and when I get it working I’ll write an installation script for it. Hopefully that can then be modified by the RACHEL team to make the plus installation easier.

For the RACHEL-Pi I have provided some premade images based on Raspbian Stretch Lite operating system at this post. The image just needs to be flashed to a MicroSD or USB drive using Etcher and it’s ready to go. I base those images on the Lite version of Raspbian as the desktop UI and other packages aren’t required. I use a variation of this installation script which I’ve put on github to install all of the prerequisites and files to turn it into a RACHEL-Pi.

I’ll try and explain in more detail about the RACHEL setup and how this will go.

The reason service files are used is because Kolibri and KA-Lite use them on their end. I try not to make any drastic changes from World Possible’s code base other than for updating and getting things working on the pi and I suspect they don’t want to switch things over for the plus while those servers still use service files. It would make it more difficult to update them.

A module is what the user interacts with in the RACHEL user interface. Modules are just folders with all of the offline content files in them. Inside each module folder is a file called rachel-index.php and that is what is shown to the user in the interface. The RACHEL user interface is the default index.php page at /var/www/index.php on the pi. You can see this file on github at index.php. At around line 85 you can see the start of the content section and a function called getmods_fs. This looks through the modules folder at /var/www/modules for folders that contain rachel-index.php files and then shows those pages to the user in a list style. There’s a bit more involving the database for hiding modules and sorting them by order, but that’s the basis of what’s happening…

A basic rachel-index.php for a module has a logo, a link to the home page for the module, and a description. You can see the module template on github here. So to create a module you just create a folder named en-internet_archive and inside would be a rachel-index.php and a logo.png file. Then you’d edit the rachel-index.php to link directly to the local internet archive server at port 4244 and fill in the description and it’s ready to go. Because it’s php you can use the php super global $_SERVER to get the right address. This would be <?php $host = "//$_SERVER[HTTP_HOST]:4244"; ?> and then insert that into the html link. I’ll make this basic module for you, but figured I’d explain in case you’re interested. If your server provides a default interface page at that address that’s all that needs to be done. When the user clicks the link it will go to the server.

Some modules provided by RACHEL, like the KA-Lite ones, use an installation script called finish_install.sh ( I wish this were python ). This does any finishing steps necessary after copying the module folder to /var/www/modules. Your module will require a script like this because of the installation of a separate server and it’s prerequisites. This way everything can be contained as a module.

So the following will need to be made

  • A service file to automatically load the IA server at startup
  • A finish_install.sh installation script to install the IA server, it’s prerequisites, and the service file
  • A module folder with everything in it and an edited rachel-index.php

If everything goes according to plan, that will all get zipped as en-internet_archive.zip and would be your module. It can then be hosted and installed through the admin interface like other modules. I’ll let you know how it goes.

James

2 Likes

Sounds good - I have an install.sh that gets run after by yarn after loading the server code from npm so it sounds like that would be called by finish_install.sh ?

On other installations the server installs from its NPM repos during the install process, but it sounds like you have a slightly different process, or maybe finish_install.sh should itself do that install form NPM ?

When you install a module from the admin interface in RACHEL it downloads the files for that module from World Possible’s rsync server directly into the /var/www/modules/ModuleName folder ( on the rachel-plus the modules folder is /media/RACHEL/rachel/modules/ ). If there’s a script called finish_install.sh present in the module folder after everything is transferred it will run that script.

I think that’s the perfect place to hook into so that this can all be installed automatically. It’s really just automating whatever process you have set up for installation. In this case the script will have to automate all of the steps you have listed on your github page under “Preliminaries to install” and then the steps in your INSTALLATION.md. It will also have to copy and set the service to run at boot. It looks like install.sh is run using that curl command in section 1 so the same command will have to be put into this script. Running multiple scripts can get tricky sometimes so we’ll see how that goes. Hopefully this can all be automated to work with a one time installation for users.

I should mention I don’t work for World Possible, I just volunteer and maintain the Raspberry Pi stuff. Anything with the plus devices as well as putting things their rsync servers will be up to them. They may also have other ideas for getting this running.

James

1 Like

So I have everything from your raspberry pi installation instructions automated with a script and installing but during the “yarn add @internetarchive/dweb-mirror @internetarchive/dweb-archive” there are several warnings and then it errors. The only other issue I ran into was that npm provided in Raspbian warns about the version of nodejs and says it supports up to 9, and the nodejs in Raspbian is at v10.15.2 but this didn’t seem to be an issue unless it affects this yarn add error.

This is the output log from the yarn add error - output pastebin

This is on Raspbian Buster, which is the latest Raspberry Pi Distro and the one that I will be making new images for soon to support the new hardware. Is it possible things have changed and those instructions don’t work?

James

1 Like

Thanks for the report and especially the log @jamesk

Most, if not all, of those warnings are deep in the package dependencies. Most are actually out of my control (I’ve posted issues in their repos) but I’ll also upgrade the one that is in my control.

The problem seems to be with the ‘gyp’ stuff which I believe is where its trying to find a pre-built binary for the right version and then when that fails is trying to build its own.

Its possible that Raspbian Buster has changed things, I’m pretty sure my existing installs are on previous version but not sure how to tell … I see
Linux box.lan 4.14.98-v7+ #1200 SMP Tue Feb 12 20:27:48 GMT 2019 armv7l
On my RPI / Raspbian .

I’ve seen other people complaining about Buster breaking things, so its possible that there are dependencies here that haven’t upgraded. I don’t think the npm version support is the issue since we are using yarn which is usually more stable.

I’m not sure how to repeat what you are doing, or if that’s even possible with what i have here. I’ll be in SF on better bandwidth from Thursday.

Do you have an exact Steps to Repeat that got to this point, since I presume you are doing something to install Raspbian and Rachel before this step ?

One experiment I’ve tried before with gyp issues on another platform was doing yarn add node-pre-gyp yarn add node-gyp yarn add gyp before adding the dweb-archive or dweb-mirror packages, it seems to give it more choices of how to give/get packages.

1 Like

That makes sense. I hate when dependencies are broken and out of my control. I’ve created and uploaded a new RACHEL-Pi image based on the latest Raspbian Buster Lite to the World Possible FTP. It’s also at this link.

There are the steps I do

  1. Download the image and flash it to a MicroSD using Etcher
  2. Put it into the pi and make sure the pi is connected to your router over ethernet
  3. Get the ip address provided to your pi and log in using Putty. The login and password are “pi/rachel” and that can be changed after using “sudo raspi-config”.
  4. sudo rfkill block wifi” to disable wifi so it’s not sharing your internet connection to anyone who connects to the hotspot while testing
  5. Download and then transfer this finish_install.sh script to the /var/tmp folder on your pi. You can do this either using winscp or there’s a file uploads module by default in this RACHEL image. If you put the pi’s ip address in your browser to get to RACHEL and upload it it will be at /var/www/modules/en-file_share/uploads. and you can move it from there.
  6. cd /var/tmp
  7. sudo sh finish_install.sh

That’s my process. That’s a very basic script that just does the steps you have listed and it should show the same thing I get. I’ll try your other gyp experiments to see what happens.

James

1 Like

It looks like that version of Buster is missing libsecret. I added apt-get install -y libsecret-1-dev to the script and it got past that step … I’ll just complete the installation and post a working script.

1 Like

Yes - that got past this step … so it just needs “libsecret-1-dev” added to the things installed.

I think its back to you now for the Rachel-specific integration (UI, start / stop etc ? )

I’m only intermittently online for about 48 hours as flying back to SF. Will then be at Internet Archive and hopefully able to demo the server running on a Pi (I’ll have one with me) at Dweb Camp.

1 Like

Alright sounds good. I’ll let you know when it’s working and post a zip with everything.

James

1 Like

One strangeness is that I’m not seeing an inserted USB in /media/pi/XXX which I’m pretty sure I used to see. Before I hunt around to try and fix it, I wondered if there is something you change on Rachel to make the disk show up somewhere I’m not finding it.

Note - I see the disk at /dev/disk/by-label/MITRA64GB but it doesnt seem to be automounting the contents as it has on other RPI configurations.

There’s a package available for Raspbian called usbmount that will automatically mount usb storage devices to /media/usb0, /media/usb1, etc. This isn’t included by default in Raspbian Lite images. I work from lite because the Raspberry Pi is low on resources and a lot of these packages aren’t necessary for RACHEL. USB mounting wasn’t necessary because RACHEL on the pi doesn’t have module loading from USB. That package is also broken for booting from USB in certain circumstances which causes it to remount drives letting anyone access it and specifically the boot drive because it’s FAT. A major security issue.

1 Like