Interview with Jiang Li, Researcher/Project Leader: Microsoft Research
Dr. Jiang Li joined
Microsoft Research, Asia as Researcher in January 1999. His research
include video compression, image processing, video broadcast and
communication, peer-to-peer networking, realistic image synthesis and
image based rendering. Before joining Microsoft, Dr. Li was an associate
professor at Zhejiang University. Dr. Li received his Ph.D in Applied
Mathematics from the State Key Laboratory of Computer Aided Design and
Computer Graphics, Zhejiang University in 1998. After obtaining his M.S.
degree in Optics from Zhejiang University in 1992, Dr. Li joined the
university's faculty and built copper vapor lasers and researched on
laser-tissue interaction for Photodynamic Therapy. Dr. Li received B.S.
degrees in both Applied Physics and Applied Mathematics from Tsinghua
University in 1989. He completed a 3-year National Natural Science
Foundation project "Wave-Based Illumination Models for Computer Graphics"
in the end of 1999. He received the Best Paper Award at Chinagraph 1996
and Chinagraph 1998. Dr. Li is leading Microsoft Portrait project.
What exactly does Microsoft Portrait do?
Microsoft Portrait is a research prototype for mobile video
communication. It supports .NET Messenger Service, Session Initiation
Protocol and Internet Locator Service on PCs, Pocket PCs and Handheld
PCs. It runs on local area networks, dialup networks and even wireless
networks with bandwidths as low as 9.6 kilobits/second. Microsoft
Portrait delivers portrait-like video if users are in low bandwidths and
displays full-color video if users are in broadband. In low bandwidths,
portrait video possesses clearer shape, smoother motion, shorter latency
and much cheaper computational cost than do conventional video
technologies. Microsoft Portrait pursues providing presence
notification, chat/voice/video functions anytime, anywhere, on any
What do you like most about Portrait?
Microsoft Portrait is the first software in the world that provides
two-way video communication on Pocket PC. Two-way video communication on
Pocket PC means you can not only receive and display video on your
Pocket PC, but also capture and send video on your Pocket PC in
real-time. The later feature is much more difficult than the former.
Usually mobile devices such as Pocket PC and Smartphone possess only
limited computational power and low connection bandwidth. Microsoft
Portrait can enable two-way video communication on Pocket PC because the
video technologies behind Microsoft Portrait possess two advantages: low
computational complexity and low bandwidth requirement.
Can we expect more changes/updates in the near future?
As I have said in the answer to the first
question. We pursue providing presence notification, chat/voice/video
functions anytime, anywhere, on any device. Let's wait to see what's
How long was Microsoft Portrait in development, and how many people were
As you can see from the What's New page on Microsoft
Portrait web site, the first version was posted in July 2001. The start
day of the development is about one year earlier than that. After the
first release, we updated it version by version according to users'
feedback and our research progress. Many people involved in the project,
and the number varied in different periods, some periods a dozen and
some periods just a couple. It's interesting that we have visiting
students joining us since we are also a research institute.
What was the most difficult aspect of the development process?
It's how to reach low bitrate target
at the beginning of development process. The motivation of the project
came from the observation that current video communication software is
still not suitable to dialup users who have only about 56 Kbps bandwidth
(the actual available bandwidth is bout 80% of it) and exist in most
areas of the world. So we want to develop a codec that works at 10-20
Kbps, therefore can provide two-way video communication for dialup
users. Since it seems that there are no more rooms in the improvement of
conventional DCT based coding, we consider using line drawings in video
communication in which expressions are the most important and scenes are
We tried edge detection
algorithms in order to extracting the outlines of face, eyes, eyebrows,
mouth, etc., but the results are not very robust. In addition, the
visual effects are also not satisfactory since if you only write the
outlines with black lines, you will see that the hair areas are white -
the color of the background. In order to avoid this situation, we
consider combining the outline image with a binary image that is
converted from a gray scale image by a threshold. In this case, hair
areas are always black and the visual quality improved significantly.
Although the visual quality becomes better, the compression ratio cannot
be higher due to the existence of lines and dots from the outlines. We
considered what would happen if we just used the binary image that was
converted from a gray scale image by a threshold and did not use any
outline information. It is surprising that the visual effects are even
better. This is exactly what the current black/white video form is.
The remaining problem is how to compress these
binary image sequences. You know there are international standards JPEG
for still full-color image, MPEG for motion full-color images and JBIG
for still binary image, but no any standard for motion binary images. By
analyzing the temporal correlation between successive frames and
flexibilities in the scene presentation using bi-level images, we
achieved very high ratios in bi-level video compression. The decription
would be long, I have to stop here. Please refer to our papers, listed
details. It's interesting.
During the development of this product was there any hilarious or
outlandish moments that stick out in your mind?
I don't think there was any hilarious or outlandish moment that stuck
out in my mind during the development of the research prototype. I would
say that the development process was smooth. A characteristic of the
development process that may be worth noting is that this is a
development process that software goes ahead of hardware. What I mean is
that cameras for Pocket PCs were just released in recent months, but our
video coding algorithm for enabling the video capture function using
Pocket PC cameras were ready over one year ago. This is also why we can
so quickly release new versions that enabled two-way video
communications when Pocket PC cameras are available. As a researcher, I
feel this is really amazing.
What do you foresee as the future of mobile communications?
In my opinion, video communication would be reachable anytime, anywhere
on any device. You may know that the business model of video phone
failed in the past. But if video feature is integrated with many other
features in a device, not only one single feature in the device, it will
be more acceptable. The current mobile phones help people to remotely
access via voice, the future mobile devices will help people to sense
visually. We can also imagine that more other human feelings such as
touch and smell will be enabled remotely in the future.
What is the bandwidth requirement for full color video?
About 50 Kbps for a QCIF size (176x144) video at an
acceptable frame rate.
Do you have anything
else to add?
I would like to take this opportunity to express my
sincere appreciations to hundreds of thousands of users who sent us bug
reports, suggestions and warm encouragement, and also to hundreds of web
sites that reported, linked or reviewed Microsoft Portrait. It's these
active responses that make us feel our research is valuable and useful
to people. Finally I want to thank my colleagues Keman Yu and Gang Chen,
who made the greatest contributions to the project.
Microsoft Project Website
Interested in doing an interview?
contact us for more information.
Return To The Front Page