In June I added the ability to River4 to write JavaScripts that run when a new item is added to a river. The run from a folder, so they're easy to edit. You can have as many as you like. They can do anything a Node.js app can do, they even have persistent storage. And they can modify the data that River4 stores. We gained a lot of experience from these kinds of callbacks working on earlier River software in the Frontier environment.
Yesterday, Nicolas Meier released a River4 callback that sends all items in a river to a Pinboard account. This makes it possible to use the RSS feeds to populate a searchable database of news items. A good demo of the callback facility and feed technology.
If you have questions, ask on the River4 mail list.
And thanks to Nicolas for this contribution!
A subscription list is an OPML file in the lists folder of your River4 data folder.
Each list corresponds to a river. The feeds in the list are the feeds we read to find new items for the corresponding river.
Suppose you wanted a river of news about movies. You'd create an OPML file in the lists folder called movies.opml. River4 would read the feeds in that list periodically, and put the new items in the riverjs file in the rivers folder. The river file is called movies.js.
To edit the list you can use a text editor, to hand-edit the OPML, or you can use an outliner whose native format is OPML.
My outliner, Fargo is perfectly suited for editing OPML subscription lists.
First create a new outline, using the File/New command in Fargo.
Create a new headline by pressing Return. Type a short description. Choose Add Feed in the Outliner menu, and paste the URL of an RSS or Atom feed. Repeat this for all the feeds you want to add to your list.
If you want to include another OPML list in this list, press Return to add a new headline. Type a short description, then choose Add Include in the Outliner menu, and enter the URL an OPML file. When River4 processes the list, it will be as if all the feeds in that list were in this list. Here's a Fargo docs page on includes.
You can use the include feature in the previous step to create a placeholder list in your lists folder. Just have it include the list you're editing in Fargo. You can get the URL of the outline by choosing Get Public Link in Fargo's File menu. This is a bit complicated, but worth studying and understanding because it makes Fargo a simple way to tell River4 what feeds you want it to read.
It's easy to have a second or third river, just create another OPML file in lists folder. It's probably good to get started with a single list, and get a feel for how busy the feeds are, how many new items you get every time you look at the river.
This is a checklist I've used for installing River4 on a fresh Ubuntu v14.04 server.
sudo apt-get update sudo apt-get install nodejs sudo apt-get install npm sudo apt-get install nodejs-legacy
We also install npm, a requirement to run Node apps.
nodejs-legacy makes it possible to run apps by saying node app.js instead of having to use nodejs, an oddity of Ubuntu.
sudo npm install forever -g
There are lots of ways to get apps to launch in the background, I like forever because it keeps the app running even if it crashes. River4, of course, never crashes (heh) but you never really know.
sudo apt-get install git
I like to install git, because it makes it easy to install River4 from GitHub.
git clone https://github.com/scripting/river4.git
That should create a directory containing River4 at /home/ubuntu/river4
To run it, follow the instructions on the River4 howto.
A 15-minute video showing how to install River4 on a Mac system.
I promised at the end of the video to upload a few screen shots of what the river looks like after it's been running a while.
The dashboard after running for 38 minutes.
The NBA news tab at that time.
The NYT news tab.
A snapshot of my river4data folder after running for 90 minutes.
I also uploaded it to Facebook and think it's a bit higher quality.
These instructions show you how to install River4 on a machine that's never had Node running on it, using the file system for storage. You can also use S3 for storage, and there's a separate howto for that kind of installation.
These instructions assume you're installing on a Macintosh, but they're very close to installing on a Unix or Windows machine.
Download Node.js from this page. Get the Macintosh installer. You don't need the source code.
Run the pkg file you downloaded. Accept all defaults, install for all users. At the end it tells you where everything was installed. I didn't need this information. It also says to make sure /usr/local/bin is in your $PATH. I didn't need to do anything, apparently it was setup correctly by default.
Download River4 from the GitHub repository, and create a folder (it can be anywhere). Copy all the files into that folder. Using the shell cd command, make that folder the current directory.
Install the packages River4 needs.
Launch River4:
Shortly after it launches, River4 automatically creates a river4data folder in the same folder as the River4 app. It contains sub-folders: data, lists and rivers.
For now, lists is the important folder. Copy a few OPML subscription lists into the lists folder. If you need some to help test your setup, you can download some examples here. Ultimately you'll want to create and maintain your own. Fargo is very good for that, its native file format is OPML. Use the Add Feed command in the Outliner menu.
Let River4 run for a while. As soon as there are new items in the feeds in your lists, river files will show up in the rivers2 folder, one corresponding to each of your lists. These are used by river browser software to display the new items for readers. See the next section.
River4 is also a web server, running on port 1337.
If you go to the home page of the server, you'll see the contents of your rivers, and commands that take you to the dashboard, repository, mail list, this blog.
Here's a 15-minute video where I install a River4, famous-chef style.
It includes screen shots of the River4 dashboard and home page after 38 minutes, and a copy of a river4data folder after 90 minutes.
If you're going to run River4, I highly recommend joining the River4 mail list.
Andrew Shell reported an issue with breakage in JSON in a relatively recent version of JavaScript. Kind of disturbing that this kind of breakage can actually happen, but we're happy to have a fix.
In River4 v0.114, I did more than Andrew recommends, and replaced all the calls to JSON.stringify in River4 with calls to the utility routine jsonStringify, and then made the fix inside that routine.
I've tested this with my three main rivers (two River4 installations) and it seems to work.
New in River4 v0.110.
I had to switch from Heroku to an AWS-hosted Ubuntu server that's hosting a bunch of other apps. Without the isolation that Heroku provides, the environment variables of all these apps were getting in each others' way. Rather than hack my way through all the arcane rules of environment variables, I decided to create a simpler more reliable (imho) way to configure River4 and nodeStorage, using a config.json file in the same directory as river4.js.
Here's a list of values you can include in this file:
fspath
password
s3path
PORT
s3defaultAcl
They work exactly the same way as the environment variables with the same names.
If you specify an environment variable and an element in config.json, the one in config.json takes precedence.
If you're using S3 storage, you need to provide the three values for your AWS account. Since the Amazon library is looking for these in environment variables, you must provide them that way.
AWS_ACCESS_KEY_ID
AWS_SECRET_ACCESS_KEY
AWS_REGION
Here's an example, the contents of config.json on my server (with values changed).
There are three top-level folders: data, lists and rivers.
The only one folder that contains input is the lists folder. Everything else is created and maintained by the River4 software.
The data folder has several sub-folders.
calendar is a calendar-structured folder broken down by year and month. Each file is a day's worth of news gathered from the feeds all your lists subscribe to.
feeds has a sub-folder for every feed your river is following. There's only one file in each sub-folder, but there's room for growth.
feedsInLists.json is a list of feeds, and a reference count for each feed. If the count goes to zero, we don't have to read the feed, because there aren't any lists subscribed to it.
feedsStats.json is a JSON array containing information about each of the feeds.
lists has a sub-folder for each of the lists. There's one file in each folder with info about the list.
prefsAndStats.json stores preference settings and overall stats for your River4 installation.
riversArray is an array of information about your rivers. You can edit the title and description for each river to control the display of the home page.
When you see a JSON file that has an element called enabled, if you set it false, the software will stop processing that object. So if you set the enabled element of a listInfo.json file to false, we will stop reading the list. It's useful if you want to keep everything in place, but turn off parts of the server.
You can turn the whole River4 aggregator and server off by setting enabled false in prefsAndStats.json.
I've seen this behavior before, at midnight, I end up with an empty river, but after the next read the river is back to normal. I now know why this happens, because it showed up in podcatch.com, and there the problem takes longer to clear, because it takes longer for a new item to show up in the river.
The problem is in buildOneRiver. It starts by calling doOneDay (starttime), where starttime is the current time. There is no file in the /data/calendar/ folder for today. So the first read fails. When a read fails, it calls finishBuild (). Since we haven't seen any data yet, the river is empty. So an empty river is written and buildOneRiver returns.
It could be total disaster if there ever was a day where nothing showed up. Not only would the river stay empty all day, but it would alway stop when it hit the empty day, making all data before that date basically unreachable.
An easy workaround for now was to create an empty array for today. When I did that and rebuilt the river, all the data showed up again on Podcatch.com. Whew.
This error probably showed up every night and wasn't cleared until the first podcast of the morning showed up. I never saw it because I'm generally not looking at podcatch.com after midnight. But other people were certainly seeing it.
We're going to need a better fix for this, but first I wanted to be sure to document the problem. Having done so I can now go to sleep.
Good night.
Dave
PS: Thanks to the Pelicans and the Warriors. They were playing such an excellent game, that's why I was farting around with the computers after midnight, during a commercial break during the game.
River4 is pretty solid software, but it needs more attention to get it all the way.
I've been seeing some new, not good, behavior on my River4 installation in the last month or so. First it started going deaf, not responding to HTTP requests, causing my serverMonitor app to send emails saying the server was down. I'd get 20 or 30 of these a day. And the new items would take longer to show up after this started happening. It seemed as if the server was thrashing. Some operation that used to be quick was now taking a long time.
So I decided to create a new instance on Heroku that read the same lists, and let it run for a while (a week or so). It was running smoothly so I just switched domains, and shut the old one down. It was gratifying how easy this was to do. That was a couple of weeks ago.
Now the new instance is starting to misbehave.
Sometimes it renders one of the river files with just items from one feed. It corrects itself on the next run. It's annoying, it should be found and fixed, whatever is causing this, but I can live with it.
Then a couple of days ago, it stopped finding new items in feeds, for five hours. I rebooted the server, and that fixed it. Until it happened again this morning. Last new item in all my rivers was at approx 3AM. A reboot of the server seems to have cured it.
I'm very focused right now on shipping a new end-user product, so I can't detour to figure out what's going on here.
If anyone has the time to trap these problems, I'd be happy to try to fix them, once I can swing back around to River4. Probably within a few weeks.
I am sharing my subscription lists so other people can run a test setup with the same feeds as mine. It's possible that I'm pushing River4 harder than anyone else. So maybe I'm seeing problems you all will be seeing soon. Or maybe you are already seeing them?
Here's my rivers page, so you can see what I'm reading.
I've zipped up my current set of subscription lists.
Discuss on the River4 list.
If you added a feed that did not have a description, saving would fail.
I was having a hell of a time getting the left-margin icons to respond to clicks. There's some weird interaction going on. If anyone has a clue, please post a comment. In the meantime, as a workaround, I added a new Feed menu to the menu bar with three commands that do what the icons do.
The big new addition is the River 4 Console.
You can edit subscription lists for your River4 installation in a browser, using the Concord outliner, the same one that's at the core of Fargo.
You can edit them either on the server machine, or remotely.
Install the latest version of River4, v0.108, on your server. You can download it from the GitHub repository.
If you want to access the server remotely using the new dashboard, set an environment variable, password. If you don't set the password, you will only be able to access the server through localhost.
Go to river4.io. In the Server menu, choose Settings.
Enter the address of your server, and the password you set in step 2. Screen shot.
Reload river4.io to get the data from your server.
You can add a new item to the list by clicking the + icon in the left margin. Enter the URL of the feed. If it could be read, the feed will appear in the list.
Click the Save button to send the changed list back to the server.
Choose any of the items in the Lists menu to load your list in the outliner box.
In the Server menu, you can also disable the server, and re-enable it. You can open the dashboard, to watch the server as its running (replacing the original dashboard, which still works of course). And you can quickly view the river that is derived from the list you're editing.
In the left margin, you can edit the attributes of any headline in a subscription list, or view the feed in a special XML browsing window (avoiding the crappy way most browsers deal with RSS).
Basically, a bunch of new server endpoints to support editing of lists and prefs.
Originally, I planned to package this as a node-webkit app before releasing it. A couple of days ago I realized this was a bigger undertaking than I had imagined, and wanted to pause for a while here, to shake the bugs out, and learn how this works, before locking it down as a standalone app.
The goal remains, to make River4 more of an end-user experience. This release makes it easy to edit subscription lists, monitor the server, and turn off the server and turn it back on. A lot of the work toward reaching the goal is in this release. I plan to use it myself to manage my own River4 server, and encourage others to do the same.
A few notes I posted to the River4 mail list.
Jake Savin worked as a developer at UserLand, and I'm really glad he's using River4 now. We know each other pretty well, having worked together on Manila and Radio UserLand, which included the first in the series of Rivers culminating in River4.
This is a different way of doing it from any of the previous products, but then I realized -- it's not. You can use River4 in exactly the same way we used River3 with Dropbox, and it would work really well in this mode.
To do so, install River4 on a local machine, and configure it to read and write from a Dropbox-shared folder.
To edit one of your subscription lists, you can use the OPML Editor, because it's designed to read and write files on the local OS.
And you can include the River.js files in your pages, by getting a public link to the files. No need to operate Apache, or any public-facing server at all.
I just wanted to get this out there, maybe it'll make someone's life easier.
Now if an <source:outline> has a flMarkdown attribute with the value false, we'll render it as a structured outline.
This is the same convention that Fargo uses, so if you're using Fargo, your web page will look great, and so will the outline in the river.
Another loose-end tied off.
Fixing problems with running River4 with file system storage.
Only read lists whose names end with .opml. We were trying to read an invisible file on the Mac, .DS_Store.
When running in local file system mode, on Windows, there are more illegal characters than were previously tested for. Now we test for them. See this page on the Windows developer site for a list.
These instructions show you how to set up River4 to work with Amazon S3 storage.
A node.js installation.
An Amazon account, and an S3 bucket to store the JSON files, and a small HTML file.
One or more OPML subscription list files.
Create an S3 bucket to hold all your subscription lists, rivers, and data for the aggregator.
On the node.js system, set an environment variable, s3path, to contain the path to the bucket created in step 1.
Again, on the node.js system, set the two AWS environment variables. This allows the River4 app to write to your bucket.
Launch river4.js on a node.js system. Suppose that server is aggregator.mydomain.com.
Look in the bucket. You should see a data folder, with a single file in it containing the default value of prefs and stats for the app. There's also an index.html file, which will display your rivers in a simple way, providing code you can crib to create your own way of browsing (room for improvement here, for sure).
Create a folder at the top level of the bucket called "lists". Save one or more OPML subscription lists into that folder.
After a while you should see a new folder called "rivers" created automatically by the software. In that folder you should see one JSON file for each list. It contains the news from those feeds, discovered by River4. This format is designed to plug into the beautfiul" river displayer.
If you want to watch the progress of the aggregator, you can view this page.
Suppose you have River4 running, generating a few River.js files, and you want to include one in your news site or blog. That's why I put this simple example page together, to provide source code you can crib to do the job.
Here's a gist that contains the source of an HTML page that you can crib from.
The code on the Hello River page is yours to do with as you please. But the files it includes are copyrighted and not at this time available under an open source license.
Include river.js and river.css in your page. It automatically includes the files it needs.
The other files we include, the Lora font, menus.css, are just to make the sample app look good. They aren't required for your pages.
To load a river, call httpGetRiver with the url of the River.js file as its only parameter.
The commands in the Rivers menu loads a few of the rivers that my server maintains.
I needed to remove the inclusion of jQuery and Bootstrap from river.js, so the river code could be included in Fargo's rendered pages. This means the Hello World app changed, and your code will have to change as well.
Just add the three lines in the <head> section of your page as the example does.
<script src="http://fargo.io/code/jquery-1.9.1.min.js">
<link href="http://fargo.io/code/bootstrap.css" rel="stylesheet">
<script src="http://fargo.io/code/bootstrap.min.js">
I removed the top and bottom margin from .divRiverContainer. It really should be up to the application that includes the river to determine what space is left above and below the river.
Important: There's a new version of these instructions that are easier, because the product has improved. These instructions are being maintained as an archive.
These instructions show you how to install River4 on a machine that's never had Node running on it, using the file system for storage. You can also use S3 for storage, and there's a separate howto for that kind of installation.
These instructions assume you're installing on a Macintosh, but they're very close to installing on a Unix or Windows machine.
Download Node.js from this page. Get the Macintosh installer. You don't need the source code.
Run the pkg file you downloaded. Accept all defaults, install for all users. At the end it tells you where everything was installed. I didn't need this information. It also says to make sure /usr/local/bin is in your $PATH. I didn't need to do anything, apparently it was setup correctly by default.
Download River4 from the GitHub repository, and create a folder (it can be anywhere). Copy all the files into that folder. The only two that are actually needed are river4.js and package.json, so if you want to save some space, you can delete the others. Using the shell cd command, make that folder the current directory.
Install the packages River4 needs.
I took a deep breath, reviewed everything to be sure I set it up correctly, and launched River4:
You can access the dashboard on the local computer with this URL, entered in a browser:
http://localhost:1337/dashboard
If you're going to run River4, I highly recommend joining the River4 mail list.
In previous versions, River4 stored all its data in an S3 bucket.
Starting in v0.96, it can be configured to store its data in the local filesystem.
The new version is on GitHub.
There's a new environment variable called fspath.
If it's not defined, everything is exactly as before.
If it is defined, we use a local directory for storage. It's specified by fspath.
I created a folder on my desktop computer, and set the environment variable with this command in the Unix shell:
export fspath=/Users/davewiner/river4data/
I created a lists sub-folder, and copied a few OPML subscription lists in the folder.
I copied river4.js into another folder and used the cd command to change to that directory, and entered:
node river4.js
The server booted up and started reading feeds.
PS: Of course I already had node.js installed, and had installed feedparser and opmlparser.
You can access the dashboard on the local computer with this URL, entered in a browser:
http://localhost:1337/dashboard
I wrote a new set of "fs" routines that are parallel to the s3 routines, and then added a layer of abstraction. Each routine calls the s3 routine if it's set up that way and calls the fs routine if it's set up the other way.
Fairly straightforward, and it worked the first time I ran it.
I have it running on two systems, one on a Mac desktop machine, storing data in the file system, and my original system in Heroku using S3 storage. Same software, configured differently.
A blog post, providing perspective.
Getting started with a new River4 installation can be slow going if you don't have a good set of subscription lists to start with.
To help get the process going for new users, here are a few of the lists I use, in a zip archive.
Each item in riverBrowser has a tiny gray share icon in the lower left corner of the item display. This icon acts just like the Radio3 bookmarklet. Click it, and Radio3 will come to the front, with the contents of this post populating the dialog.
If you're running the open source River4 aggregator, you can configure riverBrowser to access your river files.
You must enable CORS in the S3 bucket that River4 runs from. Here's a copy of the configuration I used for my bucket.
You must be signed on to Twitter. We only use Twitter for the ID system, RiverBrowser won't post anything to Twitter, or access any of your messages or private data. The command to sign on is in the system menu, at the right end of the menu bar.
Choose My rivers in the Rivers menu. An attribute editor pops up, with all the rivers that are currently being displayed by RiverBrowser. You can add or delete items from the list, or change the names of any of the items. This is where you can add pointers to your own rivers.
riverBrowser is the reader-interface for rivers produced by the River4 aggregator. River4 has new features that support output that's produced by both Fargo and Radio3.
All these products are designed to work with each other, and support innovation across all the products. The connections are all open, and all the parts are replaceable. You can generate compatible feeds with any blogging environment or content management system. The extensions to RSS we're using are all documented in the new source namespace. Any aggregator can produce the JSON files that River4 generates, that are designed to plug into RiverBrowser.
If you've been following my development process, this is the cap, where it all comes together in a simple neat package anyone can use. You don't have to understand all of the tools behind it, you don't have to understand any of them, actually. It's just useful and usable as-is
Andrew Shell dug in and found the source of a problem in building rivers.
If River4 started to re-load the OPML reading lists while a build was going on, it could cause the rivers to skip items that shouldn't be skipped. The problem would correct itself quickly on my server (made it hard to track down the problem), not so quickly on Andrew's.
The problem was pretty easy to fix, after reading Andrew's thorough description.
I had to arrange things so that the list-reading happens after the rivers are built. I put a one minute cushion in there. Have a look at everyFiveMinutes () and everyMinute () to see how it was done.
The new version is on the River4 repository.
A River4 subscription list can now contain OPML include nodes, enabling lists of lists, a crucial feature for a modern RSS reader app.
This is nice when you want to let a group of people collaborate on the curation of a river.
Also makes it possible to use Fargo to edit subscription lists. Just edit the list in the S3 bucket to contain a node that includes the Fargo list. The editors don't need to have access to the bucket.
I needed this feature for a project I'm doing where the curator is comfortable editing OPML lists, but isn't ready to make the investment in learning how to operate a river.
It also makes it possible to switch the river to another server without changing any update procedures.
We only go two levels deep. That is you can have an include in the top-level subscription list, but any includes in included lists are ignored.
The only reason for having this limit is that it eliminates the need to have a stack to make sure there's no circular recursion.