前文所说,scala的Source类自带各种输入,除了文件,也可以从url中获得数据.

最简单的是Source.fromURL

scala> scala.io.Source.fromURL("http://argcv.com").mkString
res1: String =
<!DOCTYPE html>
<html lang="en-US">
<head>
    <title>argcv | enjoy code, enjoy life.</title>
....

但是这个并不很安全.之前遇到过个问题,某服务器有问题,然后我们某个后端会尝试从服务器连接,然后被挂起了,直到耗尽了我们的连接数.我们应该加个timeout.

我改进一些代码后,得到一个简易的function如下:

  def fromUrlWithTimeout(url: String, timeout: Int = 1500): String = {
    import java.net.URL
    import scala.io.Source
    val conn = (new URL(url)).openConnection()
    conn.setConnectTimeout(timeout)
    conn.setReadTimeout(timeout)
    val stream = conn.getInputStream()
    val src = (scala.util.control.Exception.catching(classOf[Throwable]) opt Source.fromInputStream(stream).mkString) match {
      case Some(s: String) => s
      case _ => ""
    }
    stream.close()
    src
  }

使用也很简单

scala> fromUrlWithTimeout("http://argcv.com",3000)
res2: String =
<!DOCTYPE html>
<html lang="en-US">
<head>
    <title>argcv | enjoy code, enjoy life.</title>
...

或者设置一个很小的timeout,得到结果如下:

scala> fromUrlWithTimeout("http://argcv.com",100)
java.net.SocketTimeoutException: connect timed out
  at java.net.PlainSocketImpl.socketConnect(Native Method)
  at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:345)
....

若要下载文件,可以参考此处.

Categories: Code

Yu

Ideals are like the stars: we never reach them, but like the mariners of the sea, we chart our course by them.

Leave a Reply

Your email address will not be published. Required fields are marked *