Showing posts with label R. Show all posts
Showing posts with label R. Show all posts

Monday, 14 January 2013

Using TeXmacs as an interface for R

Using TeXmacs as an interface for R (part 1)

(This post was copied from my post at wordpress. Because I didn't manage to switch r-bloggers.com to use that blog, I'm reposting it here....)
A nice, but not very well known, interface to R is
TeXmacs. (I have to say that I am not totally objective, since I wrote the interface between R and TeXmacs…)
Here’s a sample window:
Screen Shot 2012-12-04 at 12.05.43 PM
In the following few posts I’d like to explain how to use this interface.

Installation

First, install TeXmacs. Best is to install the latest and greatest version of the svn, because that will include the latest changes of the interface with R, and in particular syntax highlighting. If you don’t install the latest version, you’ll probably only miss syntax highlighting. In that case it would probably be good to still install the latest version of the TeXmacs library for R.
Now that TeXmacs is installed, we can start it.
From the command line, just enter:

texmacs

A nice window will open:
TeXmacs main window
I will not go into how to work with TeXmacs very much, just the R interface.
Still, a few hints might help.
  1. You can chose your favorite key bindings in the menu Edit→Preferences→Look and Feel. I like emacs bindings.
  2. TeXmacs initially communicates with the user via the status line (bottom edge of window). If you prefer popups, select this from the menu Edit→Preferences→Interactive questions→In popup window.
  3. Usually TeXmacs files are stored with the extension “.tm”
  4. TeXmacs has a somewhat awkward interaction with the clipboard. Cut and paste within TeXmacs works fine. But, if you want to cut of paste to a different program, the text has to be converted from the special TeXmacs representation to text. To do that do:
    Edit→Copy to→Verbatim and Edit→Paste from→Verbatim. If you copy/paste a lot from other programs, you can chose:
    Tools→Selections→Import→Verbatim and Tools→Selections→Export→Verbatim. Then the default behaviour will be import and export in verbatim.
  5. TeXmacs has a lot of documentation. Explore the help menu and use its search functions.

Using TeXmacs and R

Now let’s work with R. Select the little monitor icon, and chose “R”.
Dropdown menu for choosing session type
After starting an R session, you get the usual R prompt ‘>’.
Type:

> 1:100

TeXmacs responds with the output from R:

[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
[19] 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[37] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54
[55] 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72
[73] 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90
[91] 91 92 93 94 95 96 97 98 99 100

What is nice in using TeXmacs as an interface to R is that you can now move up to the previous line, edit it, and press enter again. The output will be replaced by The new output.
If you would like to enter multiple lines of text at once, press shift-enter, instead of enter. The input field will expend, and nothing will be sent to the R process. Once you are done, press enter. Here is an example:
Screen Shot 2012-12-03 at 10.28.24 PM
You also see two additional features: syntax highlighting, and special background colors for input and output text. To turn on this coloring scheme, select the aptly named menu item Documentadd package→Programvarsession.
BTW: If you prefer to invert the behaviour of enter and shift-enter, select the option multiline input as shown below:
Screen Shot 2012-12-03 at 10.34.37 PM

Graphics

TeXmacs+R can also handle graphics.
Enter the following expression:

> plot(1:10);v()

Notice that after the first plot() command, usually a new window pops up, and you have to select the TeXmacs window again.
The result will look something like the following:
Screen Shot 2012-12-03 at 10.43.41 PM
As you see, we needed two commands: the usual plot call, and v(). v() inserts the current plot as is into TeXmacs. So you can now do the following:

> title("Our first sample plot");v()

Which gives:
Screen Shot 2012-12-04 at 11.42.04 AM
We now have a second copy of our graph, with a title. You could, of course, have done two plot commands and one v().

> plot(1:10);title("Our first sample plot");v()

Let us try a more complicated example:
Here is a nice plot taken from the R Graph Gallery.
mu1mu2s11s12s22rhox1<-seq(-10,10,length=41) # generating the vector series x1
x2#
f<-function(x1,x2){
 term1  term2  term3  term4  term5  term1*exp(term2*(term3+term4-term5))
} # setting up the function of the multivariate normal density
#
z<-outer(x1,x2,f) # calculating the density values
#
persp(x1, x2, z,
      main="Two dimensional Normal Distribution",
      sub=expression(italic(f)~(bold(x))==frac(1,2~pi~sqrt(sigma[11]~
                     sigma[22]~(1-rho^2)))~phantom(0)~exp~bgroup("{",
              list(-frac(1,2(1-rho^2)),
              bgroup("[", frac((x[1]~-~mu[1])^2, sigma[11])~-~2~rho~frac(x[1]~-~mu[1],
              sqrt(sigma[11]))~ frac(x[2]~-~mu[2],sqrt(sigma[22])) ~+~
              frac((x[2]~-~mu[2])^2, sigma[22]),"]")),"}")),
      col="lightgreen",
      theta=30, phi=20,
      r=50,
      d=0.1,
      expand=0.5,
      ltheta=90, lphi=180,
      shade=0.75,
      ticktype="detailed",
      nticks=5) # produces the 3-D plot
# adding a text line to the graph
mtext(expression(list(mu[1]==0,mu[2]==0,sigma[11]==10,sigma[22]==10,sigma[12]==15,rho==0.5)), side=3)

Created by Pretty R at inside-R.org

In order to paste this text properly into the R session, you will have to chose Edit→paste from→Verbatim. This is because TeXmacs normally treats the clipboard as representing a special TeXmacs format, not plain text.
One you press enter, you’ll see something like this:
Screen Shot 2012-12-03 at 11.03.43 PM
At the Bottom you see a strange combination of > and +. These represent the prompts given by R as the lines are sent to the session. The current interface between TeXmacs and R can not tell when the R process finished evaluating its input. Not only that, but it might be that we’re not even finished with sending/receiving all the input lines. Thesefore, hit enter a couple more times, followed by v():
Screen Shot 2012-12-03 at 11.15.02 PM
As you can see, the size of the graph doesn’t fit very well. We can control this by specifying the size of the image that should actually be produced by R v(width=8,height=8) (remember you can go up and edit the previous expression):
Screen Shot 2012-12-03 at 11.18.11 PM
Much better!

Help

There are at least two ways of calling help in R inside TeXmacs. The first, is the usual help.start(), and using a web browser. The second, which is what I usually use, is to just ask for help in the usual way:

> ?plot

The help page is inserted formated into the current Buffer:
Screen Shot 2012-12-03 at 11.24.50 PM

Working on a remote server

One of the best features of the TeXmacs+R interface is the ability to work on remote systems. To do that it is best to first set up a possibility for password-less ssh sessions (which means you need to ssh-keygen on the source system, and add the public key in the file ~/.ssh/authorized_hosts.)
You should also install the TeXmacs package on the remote system.
One that is set up, enter the following command in an R session:

> system("ssh -t REMOTE_HOST R")

You should then get the prompt of the remote system. (The argument ‘-t’ makes ssh open a ‘psydo-terminal’, which interacts better with TeXmacs) Now enter:

> library(TeXmacs)

And, if you’d like to use graphics,

> start.view()

This last command opens a postscript device, and sets it up so that the v() command will copy from it to TeXmacs.
Here is what things look like then:
Screen Shot 2012-12-04 at 12.26.40 AM
Everything should work as on the local machine. Thus, you can ask R for help, and it should be inserted nicely formated into the current buffer.

Wednesday, 3 August 2011

Faster files in R

R is fairly slow in reading files. read.table() is slow, scan() a bit faster, and readLines() fastest.
But all these are nowhere as fast as other tools that scan through files.

Let us look at an example. I have in front of me a 283M file.

(Small update: the timings where off before. First because R hashes strings, one has to quit and restart R to get good timings. But I'm not really sure why my timings were off before...)


; time grep hello file
1.79 real 0.23 user 0.27 sys
; time wc file
774797 145661836 297147275 file
2.87 real 2.13 user 0.34 sys
; time cat file | rev >/dev/null
3.58 real 0.02 user 0.35 sys

> system.time({a=readLines("file")})
user system elapsed
25.158 0.829 26.500

> system.time({a=scan("file",what=character(),sep="\n")})
Read 774797 items
user system elapsed
30.526 0.827 31.734

> system.time({a=read.table("file",sep="\n",as.is=T)})
user system elapsed
31.880 0.802 32.837


sad, isn't it? And what does R do? Process data. So we read large files in all the time.

But beyond readLines(), R has deeper routines for reading files. There are also readChar() and readBin().

It turns out that using these, one can read in files faster.

Here is the code: (The second version further down is much better...)

my.read.lines <- function( fname, buf.size=5e7 ) {
s = file.info( fname )$size
in.file = file( fname, "r" )
buf=""
res = list()
i=1
while( s > 0 ) {
n = min( c( buf.size, s ) )
buf = paste(buf, readChar( in.file, n ),sep="" )
 
s = s - n
r = strsplit( buf, "\n", fixed=T, useBytes=T)[[1]]
n=nchar(buf)
if( substr(buf,n,n)=="\n" ) {
res[[i]] = r
buf = ""
} else {
res[[i]] = head(r,-1)
buf = tail(r,1)
}
i=i+1
}
close( in.file )
c( unlist(res), buf )
}

Created by Pretty R at inside-R.org



The gain is not amazing:

> system.time({a=my.read.lines("file")})
user system elapsed
22.277 2.739 25.175


How does the code work?
 buf = paste(buf, readChar( in.file, n ),sep="" )
reads the file. We read the file in big chunks, by default 50MB at a time. The exact size of the buffer doesn't matter - on my computer, 1MB was as good, but 10k much slower. Since the loop over these chunks is done inside R code, we want the loop to have not too many iterations.

We then split the file using
r = strsplit( buf, "\n", fixed=T, useBytes=T)[[1]]
the fixed=T parameter makes strsplit faster.

The rest of the function deals with preserving the part of the buffer that might have a partial line, because we read in constant-sized chunks. (I hope this is done correctly)

The result is that we read in a file in 1/3rd of the time. This function is based on the thread Reading large files quickly in the R help mailing list. In particular, Rob Steele's note. Does anyone have a faster solution? Can we get as fast as rev? As fast as grep?



Update.

I got comments that it isn't fair to compare to grep and wc, because they don't need to keep the whole file in memory. So, I tried the following primitive program:

#include < stdio.h >
#include < stdlib.h >

main()
{
FILE *f ;
char *a ;
f = fopen("file","r") ;

a=malloc(283000001) ;
fread( a, 1, 283000000, f ) ;

fclose(f) ;
}

That finished reading the file in 0.5sec. It is even possible to write a version of this function that uses R_alloc and can by dynamically loaded. That is quite fast. I then turned to study readChar(), and why it is slower than the C program. Maybe I'll write about it sometime. But it seems that reading the whole file in at once with readChar is much faster. Though it takes more memory...

Here is the new code:
my.read.lines2=function(fname) {
s = file.info( fname )$size
buf = readChar( fname, s, useBytes=T)
strsplit( buf,"\n",fixed=T,useBytes=T)[[1]]
}

Created by Pretty R at inside-R.org


And the timings:

> system.time({f=file("file","rb");a=readChar(f,file.info("file")$size,useBytes=T);close(f)})
user system elapsed
1.302 0.746 2.080

> system.time({a=my.read.lines2("file")})
user system elapsed
10.721 1.163 12.046


Much better. readChar() could be made a bit faster, but strsplit() is where more gain can be made. This code can be used instead of readLines if needed. I haven't shown a better scan() or read.table()....


This thread will continue for other read functions.