如前文所说,scala的Source类自带各种输入,除了文件,也可以从url中获得数据.
最简单的是Source.fromURL
scala> scala.io.Source.fromURL("http://argcv.com").mkString res1: String = <!DOCTYPE html> <html lang="en-US"> <head> <title>argcv | enjoy code, enjoy life.</title> ....
但是这个并不很安全.之前遇到过个问题,某服务器有问题,然后我们某个后端会尝试从服务器连接,然后被挂起了,直到耗尽了我们的连接数.我们应该加个timeout.
我改进一些代码后,得到一个简易的function如下:
def fromUrlWithTimeout(url: String, timeout: Int = 1500): String = { import java.net.URL import scala.io.Source val conn = (new URL(url)).openConnection() conn.setConnectTimeout(timeout) conn.setReadTimeout(timeout) val stream = conn.getInputStream() val src = (scala.util.control.Exception.catching(classOf[Throwable]) opt Source.fromInputStream(stream).mkString) match { case Some(s: String) => s case _ => "" } stream.close() src }
使用也很简单
scala> fromUrlWithTimeout("http://argcv.com",3000) res2: String = <!DOCTYPE html> <html lang="en-US"> <head> <title>argcv | enjoy code, enjoy life.</title> ...
或者设置一个很小的timeout,得到结果如下:
scala> fromUrlWithTimeout("http://argcv.com",100) java.net.SocketTimeoutException: connect timed out at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345) ....
若要下载文件,可以参考此处.