Yesterday, Nicolas Meier released a River4 callback that sends all items in a river to a Pinboard account. This makes it possible to use the RSS feeds to populate a searchable database of news items. A good demo of the callback facility and feed technology.
If you have questions, ask on the River4 mail list.
And thanks to Nicolas for this contribution!
A subscription list is an OPML file in the lists folder of your River4 data folder.
Each list corresponds to a river. The feeds in the list are the feeds we read to find new items for the corresponding river.
Suppose you wanted a river of news about movies. You'd create an OPML file in the lists folder called movies.opml. River4 would read the feeds in that list periodically, and put the new items in the riverjs file in the rivers folder. The river file is called movies.js.
To edit the list you can use a text editor, to hand-edit the OPML, or you can use an outliner whose native format is OPML.
My outliner, Fargo is perfectly suited for editing OPML subscription lists.
First create a new outline, using the File/New command in Fargo.
Create a new headline by pressing Return. Type a short description. Choose Add Feed in the Outliner menu, and paste the URL of an RSS or Atom feed. Repeat this for all the feeds you want to add to your list.
If you want to include another OPML list in this list, press Return to add a new headline. Type a short description, then choose Add Include in the Outliner menu, and enter the URL an OPML file. When River4 processes the list, it will be as if all the feeds in that list were in this list. Here's a Fargo docs page on includes.
You can use the include feature in the previous step to create a placeholder list in your lists folder. Just have it include the list you're editing in Fargo. You can get the URL of the outline by choosing Get Public Link in Fargo's File menu. This is a bit complicated, but worth studying and understanding because it makes Fargo a simple way to tell River4 what feeds you want it to read.
It's easy to have a second or third river, just create another OPML file in lists folder. It's probably good to get started with a single list, and get a feel for how busy the feeds are, how many new items you get every time you look at the river.
This is a checklist I've used for installing River4 on a fresh Ubuntu v14.04 server.
sudo apt-get update sudo apt-get install nodejs sudo apt-get install npm sudo apt-get install nodejs-legacy
We also install npm, a requirement to run Node apps.
nodejs-legacy makes it possible to run apps by saying node app.js instead of having to use nodejs, an oddity of Ubuntu.
sudo npm install forever -g
There are lots of ways to get apps to launch in the background, I like forever because it keeps the app running even if it crashes. River4, of course, never crashes (heh) but you never really know.
sudo apt-get install git
I like to install git, because it makes it easy to install River4 from GitHub.
git clone https://github.com/scripting/river4.git
That should create a directory containing River4 at /home/ubuntu/river4
To run it, follow the instructions on the River4 howto.
A 15-minute video showing how to install River4 on a Mac system.
I promised at the end of the video to upload a few screen shots of what the river looks like after it's been running a while.
The dashboard after running for 38 minutes.
The NBA news tab at that time.
The NYT news tab.
A snapshot of my river4data folder after running for 90 minutes.
I also uploaded it to Facebook and think it's a bit higher quality.
These instructions show you how to install River4 on a machine that's never had Node running on it, using the file system for storage. You can also use S3 for storage, and there's a separate howto for that kind of installation.
These instructions assume you're installing on a Macintosh, but they're very close to installing on a Unix or Windows machine.
Download Node.js from this page. Get the Macintosh installer. You don't need the source code.
Run the pkg file you downloaded. Accept all defaults, install for all users. At the end it tells you where everything was installed. I didn't need this information. It also says to make sure /usr/local/bin is in your $PATH. I didn't need to do anything, apparently it was setup correctly by default.
Install the packages River4 needs.
Shortly after it launches, River4 automatically creates a river4data folder in the same folder as the River4 app. It contains sub-folders: data, lists and rivers.
For now, lists is the important folder. Copy a few OPML subscription lists into the lists folder. If you need some to help test your setup, you can download some examples here. Ultimately you'll want to create and maintain your own. Fargo is very good for that, its native file format is OPML. Use the Add Feed command in the Outliner menu.
Let River4 run for a while. As soon as there are new items in the feeds in your lists, river files will show up in the rivers2 folder, one corresponding to each of your lists. These are used by river browser software to display the new items for readers. See the next section.
River4 is also a web server, running on port 1337.
If you go to the home page of the server, you'll see the contents of your rivers, and commands that take you to the dashboard, repository, mail list, this blog.
Here's a 15-minute video where I install a River4, famous-chef style.
It includes screen shots of the River4 dashboard and home page after 38 minutes, and a copy of a river4data folder after 90 minutes.
If you're going to run River4, I highly recommend joining the River4 mail list.
In River4 v0.114, I did more than Andrew recommends, and replaced all the calls to JSON.stringify in River4 with calls to the utility routine jsonStringify, and then made the fix inside that routine.
I've tested this with my three main rivers (two River4 installations) and it seems to work.
New in River4 v0.110.
I had to switch from Heroku to an AWS-hosted Ubuntu server that's hosting a bunch of other apps. Without the isolation that Heroku provides, the environment variables of all these apps were getting in each others' way. Rather than hack my way through all the arcane rules of environment variables, I decided to create a simpler more reliable (imho) way to configure River4 and nodeStorage, using a config.json file in the same directory as river4.js.
Here's a list of values you can include in this file:
They work exactly the same way as the environment variables with the same names.
If you specify an environment variable and an element in config.json, the one in config.json takes precedence.
If you're using S3 storage, you need to provide the three values for your AWS account. Since the Amazon library is looking for these in environment variables, you must provide them that way.
Here's an example, the contents of config.json on my server (with values changed).
There are three top-level folders: data, lists and rivers.
The only one folder that contains input is the lists folder. Everything else is created and maintained by the River4 software.
The data folder has several sub-folders.
calendar is a calendar-structured folder broken down by year and month. Each file is a day's worth of news gathered from the feeds all your lists subscribe to.
feeds has a sub-folder for every feed your river is following. There's only one file in each sub-folder, but there's room for growth.
feedsInLists.json is a list of feeds, and a reference count for each feed. If the count goes to zero, we don't have to read the feed, because there aren't any lists subscribed to it.
feedsStats.json is a JSON array containing information about each of the feeds.
lists has a sub-folder for each of the lists. There's one file in each folder with info about the list.
prefsAndStats.json stores preference settings and overall stats for your River4 installation.
riversArray is an array of information about your rivers. You can edit the title and description for each river to control the display of the home page.
When you see a JSON file that has an element called enabled, if you set it false, the software will stop processing that object. So if you set the enabled element of a listInfo.json file to false, we will stop reading the list. It's useful if you want to keep everything in place, but turn off parts of the server.
You can turn the whole River4 aggregator and server off by setting enabled false in prefsAndStats.json.
I've seen this behavior before, at midnight, I end up with an empty river, but after the next read the river is back to normal. I now know why this happens, because it showed up in podcatch.com, and there the problem takes longer to clear, because it takes longer for a new item to show up in the river.
The problem is in buildOneRiver. It starts by calling doOneDay (starttime), where starttime is the current time. There is no file in the /data/calendar/ folder for today. So the first read fails. When a read fails, it calls finishBuild (). Since we haven't seen any data yet, the river is empty. So an empty river is written and buildOneRiver returns.
It could be total disaster if there ever was a day where nothing showed up. Not only would the river stay empty all day, but it would alway stop when it hit the empty day, making all data before that date basically unreachable.
An easy workaround for now was to create an empty array for today. When I did that and rebuilt the river, all the data showed up again on Podcatch.com. Whew.
This error probably showed up every night and wasn't cleared until the first podcast of the morning showed up. I never saw it because I'm generally not looking at podcatch.com after midnight. But other people were certainly seeing it.
We're going to need a better fix for this, but first I wanted to be sure to document the problem. Having done so I can now go to sleep.
PS: Thanks to the Pelicans and the Warriors. They were playing such an excellent game, that's why I was farting around with the computers after midnight, during a commercial break during the game.
River4 is pretty solid software, but it needs more attention to get it all the way.
I've been seeing some new, not good, behavior on my River4 installation in the last month or so. First it started going deaf, not responding to HTTP requests, causing my serverMonitor app to send emails saying the server was down. I'd get 20 or 30 of these a day. And the new items would take longer to show up after this started happening. It seemed as if the server was thrashing. Some operation that used to be quick was now taking a long time.
So I decided to create a new instance on Heroku that read the same lists, and let it run for a while (a week or so). It was running smoothly so I just switched domains, and shut the old one down. It was gratifying how easy this was to do. That was a couple of weeks ago.
Now the new instance is starting to misbehave.
Sometimes it renders one of the river files with just items from one feed. It corrects itself on the next run. It's annoying, it should be found and fixed, whatever is causing this, but I can live with it.
Then a couple of days ago, it stopped finding new items in feeds, for five hours. I rebooted the server, and that fixed it. Until it happened again this morning. Last new item in all my rivers was at approx 3AM. A reboot of the server seems to have cured it.
I'm very focused right now on shipping a new end-user product, so I can't detour to figure out what's going on here.
If anyone has the time to trap these problems, I'd be happy to try to fix them, once I can swing back around to River4. Probably within a few weeks.
I am sharing my subscription lists so other people can run a test setup with the same feeds as mine. It's possible that I'm pushing River4 harder than anyone else. So maybe I'm seeing problems you all will be seeing soon. Or maybe you are already seeing them?
Here's my rivers page, so you can see what I'm reading.
I've zipped up my current set of subscription lists.
Discuss on the River4 list.