Package cherrypy :: Module _cpreqbody
[hide private]
[frames] | no frames]

Source Code for Module cherrypy._cpreqbody

   1  """Request body processing for CherryPy. 
   2   
   3  .. versionadded:: 3.2 
   4   
   5  Application authors have complete control over the parsing of HTTP request 
   6  entities. In short, 
   7  :attr:`cherrypy.request.body<cherrypy._cprequest.Request.body>` 
   8  is now always set to an instance of 
   9  :class:`RequestBody<cherrypy._cpreqbody.RequestBody>`, 
  10  and *that* class is a subclass of :class:`Entity<cherrypy._cpreqbody.Entity>`. 
  11   
  12  When an HTTP request includes an entity body, it is often desirable to 
  13  provide that information to applications in a form other than the raw bytes. 
  14  Different content types demand different approaches. Examples: 
  15   
  16   * For a GIF file, we want the raw bytes in a stream. 
  17   * An HTML form is better parsed into its component fields, and each text field 
  18     decoded from bytes to unicode. 
  19   * A JSON body should be deserialized into a Python dict or list. 
  20   
  21  When the request contains a Content-Type header, the media type is used as a 
  22  key to look up a value in the 
  23  :attr:`request.body.processors<cherrypy._cpreqbody.Entity.processors>` dict. 
  24  If the full media 
  25  type is not found, then the major type is tried; for example, if no processor 
  26  is found for the 'image/jpeg' type, then we look for a processor for the 
  27  'image' types altogether. If neither the full type nor the major type has a 
  28  matching processor, then a default processor is used 
  29  (:func:`default_proc<cherrypy._cpreqbody.Entity.default_proc>`). For most 
  30  types, this means no processing is done, and the body is left unread as a 
  31  raw byte stream. Processors are configurable in an 'on_start_resource' hook. 
  32   
  33  Some processors, especially those for the 'text' types, attempt to decode bytes 
  34  to unicode. If the Content-Type request header includes a 'charset' parameter, 
  35  this is used to decode the entity. Otherwise, one or more default charsets may 
  36  be attempted, although this decision is up to each processor. If a processor 
  37  successfully decodes an Entity or Part, it should set the 
  38  :attr:`charset<cherrypy._cpreqbody.Entity.charset>` attribute 
  39  on the Entity or Part to the name of the successful charset, so that 
  40  applications can easily re-encode or transcode the value if they wish. 
  41   
  42  If the Content-Type of the request entity is of major type 'multipart', then 
  43  the above parsing process, and possibly a decoding process, is performed for 
  44  each part. 
  45   
  46  For both the full entity and multipart parts, a Content-Disposition header may 
  47  be used to fill :attr:`name<cherrypy._cpreqbody.Entity.name>` and 
  48  :attr:`filename<cherrypy._cpreqbody.Entity.filename>` attributes on the 
  49  request.body or the Part. 
  50   
  51  .. _custombodyprocessors: 
  52   
  53  Custom Processors 
  54  ================= 
  55   
  56  You can add your own processors for any specific or major MIME type. Simply add 
  57  it to the :attr:`processors<cherrypy._cprequest.Entity.processors>` dict in a 
  58  hook/tool that runs at ``on_start_resource`` or ``before_request_body``. 
  59  Here's the built-in JSON tool for an example:: 
  60   
  61      def json_in(force=True, debug=False): 
  62          request = cherrypy.serving.request 
  63          def json_processor(entity): 
  64              \"""Read application/json data into request.json.\""" 
  65              if not entity.headers.get("Content-Length", ""): 
  66                  raise cherrypy.HTTPError(411) 
  67   
  68              body = entity.fp.read() 
  69              try: 
  70                  request.json = json_decode(body) 
  71              except ValueError: 
  72                  raise cherrypy.HTTPError(400, 'Invalid JSON document') 
  73          if force: 
  74              request.body.processors.clear() 
  75              request.body.default_proc = cherrypy.HTTPError( 
  76                  415, 'Expected an application/json content type') 
  77          request.body.processors['application/json'] = json_processor 
  78   
  79  We begin by defining a new ``json_processor`` function to stick in the 
  80  ``processors`` dictionary. All processor functions take a single argument, 
  81  the ``Entity`` instance they are to process. It will be called whenever a 
  82  request is received (for those URI's where the tool is turned on) which 
  83  has a ``Content-Type`` of "application/json". 
  84   
  85  First, it checks for a valid ``Content-Length`` (raising 411 if not valid), 
  86  then reads the remaining bytes on the socket. The ``fp`` object knows its 
  87  own length, so it won't hang waiting for data that never arrives. It will 
  88  return when all data has been read. Then, we decode those bytes using 
  89  Python's built-in ``json`` module, and stick the decoded result onto 
  90  ``request.json`` . If it cannot be decoded, we raise 400. 
  91   
  92  If the "force" argument is True (the default), the ``Tool`` clears the 
  93  ``processors`` dict so that request entities of other ``Content-Types`` 
  94  aren't parsed at all. Since there's no entry for those invalid MIME 
  95  types, the ``default_proc`` method of ``cherrypy.request.body`` is 
  96  called. But this does nothing by default (usually to provide the page 
  97  handler an opportunity to handle it.) 
  98  But in our case, we want to raise 415, so we replace 
  99  ``request.body.default_proc`` 
 100  with the error (``HTTPError`` instances, when called, raise themselves). 
 101   
 102  If we were defining a custom processor, we can do so without making a ``Tool``. 
 103  Just add the config entry:: 
 104   
 105      request.body.processors = {'application/json': json_processor} 
 106   
 107  Note that you can only replace the ``processors`` dict wholesale this way, 
 108  not update the existing one. 
 109  """ 
 110   
 111  try: 
 112      from io import DEFAULT_BUFFER_SIZE 
 113  except ImportError: 
 114      DEFAULT_BUFFER_SIZE = 8192 
 115  import re 
 116  import sys 
 117  import tempfile 
 118  try: 
 119      from urllib import unquote_plus 
 120  except ImportError: 
121 - def unquote_plus(bs):
122 """Bytes version of urllib.parse.unquote_plus.""" 123 bs = bs.replace(ntob('+'), ntob(' ')) 124 atoms = bs.split(ntob('%')) 125 for i in range(1, len(atoms)): 126 item = atoms[i] 127 try: 128 pct = int(item[:2], 16) 129 atoms[i] = bytes([pct]) + item[2:] 130 except ValueError: 131 pass 132 return ntob('').join(atoms)
133 134 import cherrypy 135 from cherrypy._cpcompat import basestring, ntob, ntou 136 from cherrypy.lib import httputil 137 138 139 # ------------------------------- Processors -------------------------------- # 140
141 -def process_urlencoded(entity):
142 """Read application/x-www-form-urlencoded data into entity.params.""" 143 qs = entity.fp.read() 144 for charset in entity.attempt_charsets: 145 try: 146 params = {} 147 for aparam in qs.split(ntob('&')): 148 for pair in aparam.split(ntob(';')): 149 if not pair: 150 continue 151 152 atoms = pair.split(ntob('='), 1) 153 if len(atoms) == 1: 154 atoms.append(ntob('')) 155 156 key = unquote_plus(atoms[0]).decode(charset) 157 value = unquote_plus(atoms[1]).decode(charset) 158 159 if key in params: 160 if not isinstance(params[key], list): 161 params[key] = [params[key]] 162 params[key].append(value) 163 else: 164 params[key] = value 165 except UnicodeDecodeError: 166 pass 167 else: 168 entity.charset = charset 169 break 170 else: 171 raise cherrypy.HTTPError( 172 400, "The request entity could not be decoded. The following " 173 "charsets were attempted: %s" % repr(entity.attempt_charsets)) 174 175 # Now that all values have been successfully parsed and decoded, 176 # apply them to the entity.params dict. 177 for key, value in params.items(): 178 if key in entity.params: 179 if not isinstance(entity.params[key], list): 180 entity.params[key] = [entity.params[key]] 181 entity.params[key].append(value) 182 else: 183 entity.params[key] = value
184 185
186 -def process_multipart(entity):
187 """Read all multipart parts into entity.parts.""" 188 ib = "" 189 if 'boundary' in entity.content_type.params: 190 # http://tools.ietf.org/html/rfc2046#section-5.1.1 191 # "The grammar for parameters on the Content-type field is such that it 192 # is often necessary to enclose the boundary parameter values in quotes 193 # on the Content-type line" 194 ib = entity.content_type.params['boundary'].strip('"') 195 196 if not re.match("^[ -~]{0,200}[!-~]$", ib): 197 raise ValueError('Invalid boundary in multipart form: %r' % (ib,)) 198 199 ib = ('--' + ib).encode('ascii') 200 201 # Find the first marker 202 while True: 203 b = entity.readline() 204 if not b: 205 return 206 207 b = b.strip() 208 if b == ib: 209 break 210 211 # Read all parts 212 while True: 213 part = entity.part_class.from_fp(entity.fp, ib) 214 entity.parts.append(part) 215 part.process() 216 if part.fp.done: 217 break
218 219
220 -def process_multipart_form_data(entity):
221 """Read all multipart/form-data parts into entity.parts or entity.params. 222 """ 223 process_multipart(entity) 224 225 kept_parts = [] 226 for part in entity.parts: 227 if part.name is None: 228 kept_parts.append(part) 229 else: 230 if part.filename is None: 231 # It's a regular field 232 value = part.fullvalue() 233 else: 234 # It's a file upload. Retain the whole part so consumer code 235 # has access to its .file and .filename attributes. 236 value = part 237 238 if part.name in entity.params: 239 if not isinstance(entity.params[part.name], list): 240 entity.params[part.name] = [entity.params[part.name]] 241 entity.params[part.name].append(value) 242 else: 243 entity.params[part.name] = value 244 245 entity.parts = kept_parts
246 247
248 -def _old_process_multipart(entity):
249 """The behavior of 3.2 and lower. Deprecated and will be changed in 3.3.""" 250 process_multipart(entity) 251 252 params = entity.params 253 254 for part in entity.parts: 255 if part.name is None: 256 key = ntou('parts') 257 else: 258 key = part.name 259 260 if part.filename is None: 261 # It's a regular field 262 value = part.fullvalue() 263 else: 264 # It's a file upload. Retain the whole part so consumer code 265 # has access to its .file and .filename attributes. 266 value = part 267 268 if key in params: 269 if not isinstance(params[key], list): 270 params[key] = [params[key]] 271 params[key].append(value) 272 else: 273 params[key] = value
274 275 276 # -------------------------------- Entities --------------------------------- #
277 -class Entity(object):
278 279 """An HTTP request body, or MIME multipart body. 280 281 This class collects information about the HTTP request entity. When a 282 given entity is of MIME type "multipart", each part is parsed into its own 283 Entity instance, and the set of parts stored in 284 :attr:`entity.parts<cherrypy._cpreqbody.Entity.parts>`. 285 286 Between the ``before_request_body`` and ``before_handler`` tools, CherryPy 287 tries to process the request body (if any) by calling 288 :func:`request.body.process<cherrypy._cpreqbody.RequestBody.process>`. 289 This uses the ``content_type`` of the Entity to look up a suitable 290 processor in 291 :attr:`Entity.processors<cherrypy._cpreqbody.Entity.processors>`, 292 a dict. 293 If a matching processor cannot be found for the complete Content-Type, 294 it tries again using the major type. For example, if a request with an 295 entity of type "image/jpeg" arrives, but no processor can be found for 296 that complete type, then one is sought for the major type "image". If a 297 processor is still not found, then the 298 :func:`default_proc<cherrypy._cpreqbody.Entity.default_proc>` method 299 of the Entity is called (which does nothing by default; you can 300 override this too). 301 302 CherryPy includes processors for the "application/x-www-form-urlencoded" 303 type, the "multipart/form-data" type, and the "multipart" major type. 304 CherryPy 3.2 processes these types almost exactly as older versions. 305 Parts are passed as arguments to the page handler using their 306 ``Content-Disposition.name`` if given, otherwise in a generic "parts" 307 argument. Each such part is either a string, or the 308 :class:`Part<cherrypy._cpreqbody.Part>` itself if it's a file. (In this 309 case it will have ``file`` and ``filename`` attributes, or possibly a 310 ``value`` attribute). Each Part is itself a subclass of 311 Entity, and has its own ``process`` method and ``processors`` dict. 312 313 There is a separate processor for the "multipart" major type which is more 314 flexible, and simply stores all multipart parts in 315 :attr:`request.body.parts<cherrypy._cpreqbody.Entity.parts>`. You can 316 enable it with:: 317 318 cherrypy.request.body.processors['multipart'] = _cpreqbody.process_multipart 319 320 in an ``on_start_resource`` tool. 321 """ 322 323 # http://tools.ietf.org/html/rfc2046#section-4.1.2: 324 # "The default character set, which must be assumed in the 325 # absence of a charset parameter, is US-ASCII." 326 # However, many browsers send data in utf-8 with no charset. 327 attempt_charsets = ['utf-8'] 328 """A list of strings, each of which should be a known encoding. 329 330 When the Content-Type of the request body warrants it, each of the given 331 encodings will be tried in order. The first one to successfully decode the 332 entity without raising an error is stored as 333 :attr:`entity.charset<cherrypy._cpreqbody.Entity.charset>`. This defaults 334 to ``['utf-8']`` (plus 'ISO-8859-1' for "text/\*" types, as required by 335 `HTTP/1.1 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1>`_), 336 but ``['us-ascii', 'utf-8']`` for multipart parts. 337 """ 338 339 charset = None 340 """The successful decoding; see "attempt_charsets" above.""" 341 342 content_type = None 343 """The value of the Content-Type request header. 344 345 If the Entity is part of a multipart payload, this will be the Content-Type 346 given in the MIME headers for this part. 347 """ 348 349 default_content_type = 'application/x-www-form-urlencoded' 350 """This defines a default ``Content-Type`` to use if no Content-Type header 351 is given. The empty string is used for RequestBody, which results in the 352 request body not being read or parsed at all. This is by design; a missing 353 ``Content-Type`` header in the HTTP request entity is an error at best, 354 and a security hole at worst. For multipart parts, however, the MIME spec 355 declares that a part with no Content-Type defaults to "text/plain" 356 (see :class:`Part<cherrypy._cpreqbody.Part>`). 357 """ 358 359 filename = None 360 """The ``Content-Disposition.filename`` header, if available.""" 361 362 fp = None 363 """The readable socket file object.""" 364 365 headers = None 366 """A dict of request/multipart header names and values. 367 368 This is a copy of the ``request.headers`` for the ``request.body``; 369 for multipart parts, it is the set of headers for that part. 370 """ 371 372 length = None 373 """The value of the ``Content-Length`` header, if provided.""" 374 375 name = None 376 """The "name" parameter of the ``Content-Disposition`` header, if any.""" 377 378 params = None 379 """ 380 If the request Content-Type is 'application/x-www-form-urlencoded' or 381 multipart, this will be a dict of the params pulled from the entity 382 body; that is, it will be the portion of request.params that come 383 from the message body (sometimes called "POST params", although they 384 can be sent with various HTTP method verbs). This value is set between 385 the 'before_request_body' and 'before_handler' hooks (assuming that 386 process_request_body is True).""" 387 388 processors = {'application/x-www-form-urlencoded': process_urlencoded, 389 'multipart/form-data': process_multipart_form_data, 390 'multipart': process_multipart, 391 } 392 """A dict of Content-Type names to processor methods.""" 393 394 parts = None 395 """A list of Part instances if ``Content-Type`` is of major type 396 "multipart".""" 397 398 part_class = None 399 """The class used for multipart parts. 400 401 You can replace this with custom subclasses to alter the processing of 402 multipart parts. 403 """ 404
405 - def __init__(self, fp, headers, params=None, parts=None):
406 # Make an instance-specific copy of the class processors 407 # so Tools, etc. can replace them per-request. 408 self.processors = self.processors.copy() 409 410 self.fp = fp 411 self.headers = headers 412 413 if params is None: 414 params = {} 415 self.params = params 416 417 if parts is None: 418 parts = [] 419 self.parts = parts 420 421 # Content-Type 422 self.content_type = headers.elements('Content-Type') 423 if self.content_type: 424 self.content_type = self.content_type[0] 425 else: 426 self.content_type = httputil.HeaderElement.from_str( 427 self.default_content_type) 428 429 # Copy the class 'attempt_charsets', prepending any Content-Type 430 # charset 431 dec = self.content_type.params.get("charset", None) 432 if dec: 433 self.attempt_charsets = [dec] + [c for c in self.attempt_charsets 434 if c != dec] 435 else: 436 self.attempt_charsets = self.attempt_charsets[:] 437 438 # Length 439 self.length = None 440 clen = headers.get('Content-Length', None) 441 # If Transfer-Encoding is 'chunked', ignore any Content-Length. 442 if ( 443 clen is not None and 444 'chunked' not in headers.get('Transfer-Encoding', '') 445 ): 446 try: 447 self.length = int(clen) 448 except ValueError: 449 pass 450 451 # Content-Disposition 452 self.name = None 453 self.filename = None 454 disp = headers.elements('Content-Disposition') 455 if disp: 456 disp = disp[0] 457 if 'name' in disp.params: 458 self.name = disp.params['name'] 459 if self.name.startswith('"') and self.name.endswith('"'): 460 self.name = self.name[1:-1] 461 if 'filename' in disp.params: 462 self.filename = disp.params['filename'] 463 if ( 464 self.filename.startswith('"') and 465 self.filename.endswith('"') 466 ): 467 self.filename = self.filename[1:-1]
468 469 # The 'type' attribute is deprecated in 3.2; remove it in 3.3. 470 type = property( 471 lambda self: self.content_type, 472 doc="A deprecated alias for " 473 ":attr:`content_type<cherrypy._cpreqbody.Entity.content_type>`." 474 ) 475
476 - def read(self, size=None, fp_out=None):
477 return self.fp.read(size, fp_out)
478
479 - def readline(self, size=None):
480 return self.fp.readline(size)
481
482 - def readlines(self, sizehint=None):
483 return self.fp.readlines(sizehint)
484
485 - def __iter__(self):
486 return self
487
488 - def __next__(self):
489 line = self.readline() 490 if not line: 491 raise StopIteration 492 return line
493
494 - def next(self):
495 return self.__next__()
496
497 - def read_into_file(self, fp_out=None):
498 """Read the request body into fp_out (or make_file() if None). 499 500 Return fp_out. 501 """ 502 if fp_out is None: 503 fp_out = self.make_file() 504 self.read(fp_out=fp_out) 505 return fp_out
506
507 - def make_file(self):
508 """Return a file-like object into which the request body will be read. 509 510 By default, this will return a TemporaryFile. Override as needed. 511 See also :attr:`cherrypy._cpreqbody.Part.maxrambytes`.""" 512 return tempfile.TemporaryFile()
513
514 - def fullvalue(self):
515 """Return this entity as a string, whether stored in a file or not.""" 516 if self.file: 517 # It was stored in a tempfile. Read it. 518 self.file.seek(0) 519 value = self.file.read() 520 self.file.seek(0) 521 else: 522 value = self.value 523 return value
524
525 - def process(self):
526 """Execute the best-match processor for the given media type.""" 527 proc = None 528 ct = self.content_type.value 529 try: 530 proc = self.processors[ct] 531 except KeyError: 532 toptype = ct.split('/', 1)[0] 533 try: 534 proc = self.processors[toptype] 535 except KeyError: 536 pass 537 if proc is None: 538 self.default_proc() 539 else: 540 proc(self)
541
542 - def default_proc(self):
543 """Called if a more-specific processor is not found for the 544 ``Content-Type``. 545 """ 546 # Leave the fp alone for someone else to read. This works fine 547 # for request.body, but the Part subclasses need to override this 548 # so they can move on to the next part. 549 pass
550 551
552 -class Part(Entity):
553 554 """A MIME part entity, part of a multipart entity.""" 555 556 # "The default character set, which must be assumed in the absence of a 557 # charset parameter, is US-ASCII." 558 attempt_charsets = ['us-ascii', 'utf-8'] 559 """A list of strings, each of which should be a known encoding. 560 561 When the Content-Type of the request body warrants it, each of the given 562 encodings will be tried in order. The first one to successfully decode the 563 entity without raising an error is stored as 564 :attr:`entity.charset<cherrypy._cpreqbody.Entity.charset>`. This defaults 565 to ``['utf-8']`` (plus 'ISO-8859-1' for "text/\*" types, as required by 566 `HTTP/1.1 <http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1>`_), 567 but ``['us-ascii', 'utf-8']`` for multipart parts. 568 """ 569 570 boundary = None 571 """The MIME multipart boundary.""" 572 573 default_content_type = 'text/plain' 574 """This defines a default ``Content-Type`` to use if no Content-Type header 575 is given. The empty string is used for RequestBody, which results in the 576 request body not being read or parsed at all. This is by design; a missing 577 ``Content-Type`` header in the HTTP request entity is an error at best, 578 and a security hole at worst. For multipart parts, however (this class), 579 the MIME spec declares that a part with no Content-Type defaults to 580 "text/plain". 581 """ 582 583 # This is the default in stdlib cgi. We may want to increase it. 584 maxrambytes = 1000 585 """The threshold of bytes after which point the ``Part`` will store 586 its data in a file (generated by 587 :func:`make_file<cherrypy._cprequest.Entity.make_file>`) 588 instead of a string. Defaults to 1000, just like the :mod:`cgi` 589 module in Python's standard library. 590 """ 591
592 - def __init__(self, fp, headers, boundary):
593 Entity.__init__(self, fp, headers) 594 self.boundary = boundary 595 self.file = None 596 self.value = None
597
598 - def from_fp(cls, fp, boundary):
599 headers = cls.read_headers(fp) 600 return cls(fp, headers, boundary)
601 from_fp = classmethod(from_fp) 602
603 - def read_headers(cls, fp):
604 headers = httputil.HeaderMap() 605 while True: 606 line = fp.readline() 607 if not line: 608 # No more data--illegal end of headers 609 raise EOFError("Illegal end of headers.") 610 611 if line == ntob('\r\n'): 612 # Normal end of headers 613 break 614 if not line.endswith(ntob('\r\n')): 615 raise ValueError("MIME requires CRLF terminators: %r" % line) 616 617 if line[0] in ntob(' \t'): 618 # It's a continuation line. 619 v = line.strip().decode('ISO-8859-1') 620 else: 621 k, v = line.split(ntob(":"), 1) 622 k = k.strip().decode('ISO-8859-1') 623 v = v.strip().decode('ISO-8859-1') 624 625 existing = headers.get(k) 626 if existing: 627 v = ", ".join((existing, v)) 628 headers[k] = v 629 630 return headers
631 read_headers = classmethod(read_headers) 632
633 - def read_lines_to_boundary(self, fp_out=None):
634 """Read bytes from self.fp and return or write them to a file. 635 636 If the 'fp_out' argument is None (the default), all bytes read are 637 returned in a single byte string. 638 639 If the 'fp_out' argument is not None, it must be a file-like 640 object that supports the 'write' method; all bytes read will be 641 written to the fp, and that fp is returned. 642 """ 643 endmarker = self.boundary + ntob("--") 644 delim = ntob("") 645 prev_lf = True 646 lines = [] 647 seen = 0 648 while True: 649 line = self.fp.readline(1 << 16) 650 if not line: 651 raise EOFError("Illegal end of multipart body.") 652 if line.startswith(ntob("--")) and prev_lf: 653 strippedline = line.strip() 654 if strippedline == self.boundary: 655 break 656 if strippedline == endmarker: 657 self.fp.finish() 658 break 659 660 line = delim + line 661 662 if line.endswith(ntob("\r\n")): 663 delim = ntob("\r\n") 664 line = line[:-2] 665 prev_lf = True 666 elif line.endswith(ntob("\n")): 667 delim = ntob("\n") 668 line = line[:-1] 669 prev_lf = True 670 else: 671 delim = ntob("") 672 prev_lf = False 673 674 if fp_out is None: 675 lines.append(line) 676 seen += len(line) 677 if seen > self.maxrambytes: 678 fp_out = self.make_file() 679 for line in lines: 680 fp_out.write(line) 681 else: 682 fp_out.write(line) 683 684 if fp_out is None: 685 result = ntob('').join(lines) 686 for charset in self.attempt_charsets: 687 try: 688 result = result.decode(charset) 689 except UnicodeDecodeError: 690 pass 691 else: 692 self.charset = charset 693 return result 694 else: 695 raise cherrypy.HTTPError( 696 400, 697 "The request entity could not be decoded. The following " 698 "charsets were attempted: %s" % repr(self.attempt_charsets) 699 ) 700 else: 701 fp_out.seek(0) 702 return fp_out
703
704 - def default_proc(self):
705 """Called if a more-specific processor is not found for the 706 ``Content-Type``. 707 """ 708 if self.filename: 709 # Always read into a file if a .filename was given. 710 self.file = self.read_into_file() 711 else: 712 result = self.read_lines_to_boundary() 713 if isinstance(result, basestring): 714 self.value = result 715 else: 716 self.file = result
717
718 - def read_into_file(self, fp_out=None):
719 """Read the request body into fp_out (or make_file() if None). 720 721 Return fp_out. 722 """ 723 if fp_out is None: 724 fp_out = self.make_file() 725 self.read_lines_to_boundary(fp_out=fp_out) 726 return fp_out
727 728 Entity.part_class = Part 729 730 try: 731 inf = float('inf') 732 except ValueError: 733 # Python 2.4 and lower
734 - class Infinity(object):
735
736 - def __cmp__(self, other):
737 return 1
738
739 - def __sub__(self, other):
740 return self
741 inf = Infinity() 742 743 744 comma_separated_headers = [ 745 'Accept', 'Accept-Charset', 'Accept-Encoding', 746 'Accept-Language', 'Accept-Ranges', 'Allow', 747 'Cache-Control', 'Connection', 'Content-Encoding', 748 'Content-Language', 'Expect', 'If-Match', 749 'If-None-Match', 'Pragma', 'Proxy-Authenticate', 750 'Te', 'Trailer', 'Transfer-Encoding', 'Upgrade', 751 'Vary', 'Via', 'Warning', 'Www-Authenticate' 752 ] 753 754
755 -class SizedReader:
756
757 - def __init__(self, fp, length, maxbytes, bufsize=DEFAULT_BUFFER_SIZE, 758 has_trailers=False):
759 # Wrap our fp in a buffer so peek() works 760 self.fp = fp 761 self.length = length 762 self.maxbytes = maxbytes 763 self.buffer = ntob('') 764 self.bufsize = bufsize 765 self.bytes_read = 0 766 self.done = False 767 self.has_trailers = has_trailers
768
769 - def read(self, size=None, fp_out=None):
770 """Read bytes from the request body and return or write them to a file. 771 772 A number of bytes less than or equal to the 'size' argument are read 773 off the socket. The actual number of bytes read are tracked in 774 self.bytes_read. The number may be smaller than 'size' when 1) the 775 client sends fewer bytes, 2) the 'Content-Length' request header 776 specifies fewer bytes than requested, or 3) the number of bytes read 777 exceeds self.maxbytes (in which case, 413 is raised). 778 779 If the 'fp_out' argument is None (the default), all bytes read are 780 returned in a single byte string. 781 782 If the 'fp_out' argument is not None, it must be a file-like 783 object that supports the 'write' method; all bytes read will be 784 written to the fp, and None is returned. 785 """ 786 787 if self.length is None: 788 if size is None: 789 remaining = inf 790 else: 791 remaining = size 792 else: 793 remaining = self.length - self.bytes_read 794 if size and size < remaining: 795 remaining = size 796 if remaining == 0: 797 self.finish() 798 if fp_out is None: 799 return ntob('') 800 else: 801 return None 802 803 chunks = [] 804 805 # Read bytes from the buffer. 806 if self.buffer: 807 if remaining is inf: 808 data = self.buffer 809 self.buffer = ntob('') 810 else: 811 data = self.buffer[:remaining] 812 self.buffer = self.buffer[remaining:] 813 datalen = len(data) 814 remaining -= datalen 815 816 # Check lengths. 817 self.bytes_read += datalen 818 if self.maxbytes and self.bytes_read > self.maxbytes: 819 raise cherrypy.HTTPError(413) 820 821 # Store the data. 822 if fp_out is None: 823 chunks.append(data) 824 else: 825 fp_out.write(data) 826 827 # Read bytes from the socket. 828 while remaining > 0: 829 chunksize = min(remaining, self.bufsize) 830 try: 831 data = self.fp.read(chunksize) 832 except Exception: 833 e = sys.exc_info()[1] 834 if e.__class__.__name__ == 'MaxSizeExceeded': 835 # Post data is too big 836 raise cherrypy.HTTPError( 837 413, "Maximum request length: %r" % e.args[1]) 838 else: 839 raise 840 if not data: 841 self.finish() 842 break 843 datalen = len(data) 844 remaining -= datalen 845 846 # Check lengths. 847 self.bytes_read += datalen 848 if self.maxbytes and self.bytes_read > self.maxbytes: 849 raise cherrypy.HTTPError(413) 850 851 # Store the data. 852 if fp_out is None: 853 chunks.append(data) 854 else: 855 fp_out.write(data) 856 857 if fp_out is None: 858 return ntob('').join(chunks)
859
860 - def readline(self, size=None):
861 """Read a line from the request body and return it.""" 862 chunks = [] 863 while size is None or size > 0: 864 chunksize = self.bufsize 865 if size is not None and size < self.bufsize: 866 chunksize = size 867 data = self.read(chunksize) 868 if not data: 869 break 870 pos = data.find(ntob('\n')) + 1 871 if pos: 872 chunks.append(data[:pos]) 873 remainder = data[pos:] 874 self.buffer += remainder 875 self.bytes_read -= len(remainder) 876 break 877 else: 878 chunks.append(data) 879 return ntob('').join(chunks)
880
881 - def readlines(self, sizehint=None):
882 """Read lines from the request body and return them.""" 883 if self.length is not None: 884 if sizehint is None: 885 sizehint = self.length - self.bytes_read 886 else: 887 sizehint = min(sizehint, self.length - self.bytes_read) 888 889 lines = [] 890 seen = 0 891 while True: 892 line = self.readline() 893 if not line: 894 break 895 lines.append(line) 896 seen += len(line) 897 if seen >= sizehint: 898 break 899 return lines
900
901 - def finish(self):
902 self.done = True 903 if self.has_trailers and hasattr(self.fp, 'read_trailer_lines'): 904 self.trailers = {} 905 906 try: 907 for line in self.fp.read_trailer_lines(): 908 if line[0] in ntob(' \t'): 909 # It's a continuation line. 910 v = line.strip() 911 else: 912 try: 913 k, v = line.split(ntob(":"), 1) 914 except ValueError: 915 raise ValueError("Illegal header line.") 916 k = k.strip().title() 917 v = v.strip() 918 919 if k in comma_separated_headers: 920 existing = self.trailers.get(envname) 921 if existing: 922 v = ntob(", ").join((existing, v)) 923 self.trailers[k] = v 924 except Exception: 925 e = sys.exc_info()[1] 926 if e.__class__.__name__ == 'MaxSizeExceeded': 927 # Post data is too big 928 raise cherrypy.HTTPError( 929 413, "Maximum request length: %r" % e.args[1]) 930 else: 931 raise
932 933
934 -class RequestBody(Entity):
935 936 """The entity of the HTTP request.""" 937 938 bufsize = 8 * 1024 939 """The buffer size used when reading the socket.""" 940 941 # Don't parse the request body at all if the client didn't provide 942 # a Content-Type header. See 943 # https://bitbucket.org/cherrypy/cherrypy/issue/790 944 default_content_type = '' 945 """This defines a default ``Content-Type`` to use if no Content-Type header 946 is given. The empty string is used for RequestBody, which results in the 947 request body not being read or parsed at all. This is by design; a missing 948 ``Content-Type`` header in the HTTP request entity is an error at best, 949 and a security hole at worst. For multipart parts, however, the MIME spec 950 declares that a part with no Content-Type defaults to "text/plain" 951 (see :class:`Part<cherrypy._cpreqbody.Part>`). 952 """ 953 954 maxbytes = None 955 """Raise ``MaxSizeExceeded`` if more bytes than this are read from 956 the socket. 957 """ 958
959 - def __init__(self, fp, headers, params=None, request_params=None):
960 Entity.__init__(self, fp, headers, params) 961 962 # http://www.w3.org/Protocols/rfc2616/rfc2616-sec3.html#sec3.7.1 963 # When no explicit charset parameter is provided by the 964 # sender, media subtypes of the "text" type are defined 965 # to have a default charset value of "ISO-8859-1" when 966 # received via HTTP. 967 if self.content_type.value.startswith('text/'): 968 for c in ('ISO-8859-1', 'iso-8859-1', 'Latin-1', 'latin-1'): 969 if c in self.attempt_charsets: 970 break 971 else: 972 self.attempt_charsets.append('ISO-8859-1') 973 974 # Temporary fix while deprecating passing .parts as .params. 975 self.processors['multipart'] = _old_process_multipart 976 977 if request_params is None: 978 request_params = {} 979 self.request_params = request_params
980
981 - def process(self):
982 """Process the request entity based on its Content-Type.""" 983 # "The presence of a message-body in a request is signaled by the 984 # inclusion of a Content-Length or Transfer-Encoding header field in 985 # the request's message-headers." 986 # It is possible to send a POST request with no body, for example; 987 # however, app developers are responsible in that case to set 988 # cherrypy.request.process_body to False so this method isn't called. 989 h = cherrypy.serving.request.headers 990 if 'Content-Length' not in h and 'Transfer-Encoding' not in h: 991 raise cherrypy.HTTPError(411) 992 993 self.fp = SizedReader(self.fp, self.length, 994 self.maxbytes, bufsize=self.bufsize, 995 has_trailers='Trailer' in h) 996 super(RequestBody, self).process() 997 998 # Body params should also be a part of the request_params 999 # add them in here. 1000 request_params = self.request_params 1001 for key, value in self.params.items(): 1002 # Python 2 only: keyword arguments must be byte strings (type 1003 # 'str'). 1004 if sys.version_info < (3, 0): 1005 if isinstance(key, unicode): 1006 key = key.encode('ISO-8859-1') 1007 1008 if key in request_params: 1009 if not isinstance(request_params[key], list): 1010 request_params[key] = [request_params[key]] 1011 request_params[key].append(value) 1012 else: 1013 request_params[key] = value
1014